Since these need to be refcounted, we throw them directly into struct
mp_image instead of being part of mp_colorspace. Even though they would
semantically make more sense in mp_colorspace, having them there is
really awkward because mp_colorspace is passed around and stored a lot,
and this way their lifetime is exactly tied to the lifetime of the
mp_image associated with it.
Can be enabled via --vd-lavc-dr=yes. See manpage additions for what it
does.
This reminds of the MPlayer -dr flag, but the implementation is
completely different. It's the same basic concept: letting the decoder
render into a GPU buffer to avoid a copy. Unlike MPlayer, this doesn't
try to go through filters (libavfilter doesn't support this anyway).
Unless a filter can work in-place, DR will be silently disabled. MPlayer
had very complex semantics about buffer types and management (which
apparently nobody ever understood) and weird restrictions that mostly
limited it to mpeg2 style codecs. The mpv code does not do any of this,
and just lets the decoder allocate an arbitrary number of untyped
images. (No MPlayer code was used.)
Parts of the code based on work by atomnuker (starting point for the
generic code) and haasn (some GL definitions, some basic PBO code, and
correct fencing).
Remove this code because it could be argued that it contains GPL-only
code (see commit 642e963c86 for details).
The remaining aspect methods appear to work just as well, are
potentially more compatible to other players, and the code becomes much
simpler.
Commit d5702d3b95 messed up the order of destruction of the elements: it
destroyed the avctx before the hwaccel uninit, even though the hwaccel
uninit could access avctx. This could happen with some old hwaccels
only, such as D3D ones before the new libavcodec hwaccel API.
Fix this by making use of the fact that avcodec_flush_buffers() will
uninit the underlying hwaccel. Thus we can be sure avctx is not using
any hwaccel objects anymore, and it's safe to uninit the hwaccel.
Move the hwdec_dev dstruction code with it, because otherwise we would
in theory potentially create some dangling pointers in avctx.
This sets AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH, which some hwaccels
using the new generic API respect. These do profile selection in
libavcodec, so it can be controlled only with an external flag, instead
of in mpv code like it used to be done.
They have been deprecated for a decade, yet you're forced to explicitly
deal with them at every step, or they will break your shit.
FFmpeg insists on keeping them, because libavfilter is too stupid to
deal with color ranges properly. Ridiculous.
For some braindead reason, Microsoft decided to prevent you from
dynamically loading system libraries. This makes portability harder.
And we're talking about portability between Microsoft OSes!
This partially reverts the change from a longer time ago to always build
DXVA2 and D3D11VA together.
To make it simpler, we change the following:
- building with ANGLE headers is now required to build D3D hwaccels
- if DXVA2 is enabled, D3D11VA is still forcibly built
- the CLI vo_opengl ANGLE backend is now under --egl-angle-win32
This is done to reduce the dependency mess slightly.
MaxCLL is the more authoritative source for the metadata we are
interested in. The use of mastering metadata is sort of a hack anyway,
since there's no clearly-defined relationship between the mastering peak
brightness and the actual content. (Unlike MaxCLL, which is an explicit
relationship)
Also move the parameter fixing to `fix_image_params`
I don't know if the avutil check is strictly necessary but I've included
it anyway to be on the safe side.
List of changes:
1. Kill nom_peak, since it's a pointless non-field that stores nothing
of value and is _always_ derived from ref_white anyway.
2. Kill ref_white/--target-brightness, because the only case it really
existed for (PQ) actually doesn't need to be this general: According
to ITU-R BT.2100, PQ *always* assumes a reference monitor with a
white point of 100 cd/m².
3. Improve documentation and comments surrounding this stuff.
4. Clean up some of the code in general. Move stuff where it belongs.
"Almost" because this might contain copyright by michael, who agreed
with LGPL, but only once the core is LGPL. This is preparation for that
to happen.
Apart from that, the usual remarks apply. In particular, dec_video.c
started out quite chaotic with no modularization, but was later
basically gutted, and in general rewritten a bunch of times. Not going
to give a history lesson.
Special attention needs to be given to 3 patches by cehosos, who did not
agree to the relicensing:
240b743ebdf: --field-dominance
e32cbbf7dc3: reinit VO if aspect ratio changes
306f6243fdf: use container aspect if codec aspect unset (?)
The first patch is pretty clearly still in the current code, and needs
to be disabled for LGPL.
The functionality of the second patch is still active, but implemented
completely different, and as part of general frame parameter changes (at
the time of the patch, MPlayer already reinitialized the VO on frame
size and pixel format changes - all this was merged into a single check
for changing image parameters).
The third patch makes me a bit more uncomfortable. It appears the code
was moved to dec_video.c in de68b8f23c, and further changed in
82f0d373, 0a0bb905, and bf13bd0d. You could claim that cehoyos'
copyright still sticks. Fortunately, we implement alternative aspect
detection, which is simpler and probably preferable, and which arguably
contains none of the original code and logic, and thus should be fully
safe.
While I don't know if cehoyos' copyright actually still applies, I'm
more comfortable with making the code GPL-only for now. Also change the
default to use the (in future) plain LGPL code, and deprecate the one
associated with the GPL code, so we can eventually remove the GPL code.
But it's also possible we decide that the copyright doesn't apply, and
undo the deprecation and GPL guards.
I expect that users won't notice anything. If you ask me, the old aspect
method was probably an accidental bug instead of intentional behavior.
Although, the new aspect method was broken too, so I had to fix it.
iive agreed only to LGPL 3.0+ only. All of his relevant changes are for
XvMC, for which mpv completely dropped support back in mplayer2 times.
But you could claim that the get_format code represents some residual
copyright (everything else added by iive was removed, only get_format
still is around in some form). While I doubt that this is relly
copyright-relevant, consider it is for now.
michael is the original author of vd_lavc.c, so this file can become
LGPL only after the core becomes LGPL.
cehoyos did not agree with the LGPL relicensing, but all of his code is
gone.
Some others could not be reached, but their code is gone as well. In
particular, vdpau support, which was originally done by Nvidia, had
larger impact on vd_lavc.c, but vdpau support was first refactored a few
times (for the purpose of modularization) and moved to different files,
and then decoding was completely moved to libavcodec.
Lastly, assigning the "opaque" field was moved by Gwenole Beauchesne in
commit 8e5edec13e. Agreement is pending (due to copyright apparently
owned by the author's employer). So just undo the change, so we don't
have to think about whether the change is copyrightable.
Unfortunately quite a mess, in particular due to the need to have some
compatibility with the old API. (The old API will be supported only in
short term.)
The new API has literally no advantages (other than that we can drop
mp_vt_download_image and other things later), but it's sort-of uniform
with the other hwaccels.
"--videotoolbox-format=no" is not supported with the new API, because it
doesn't "fit in". Probably could be added later again.
The iOS code change is untested (no way to test).
It's not really guaranteed that other components always set this (e.g.
on subtle errors), so check it explicitly. Although I'm not aware of a
failing case.
If vo_opengl is used, and vo_opengl already created the vdpau interop
(for whatever reasons), and then preemption happens, and then you try to
enable hw decoding, it failed. The reason was that preemption recovery
is not run at any point before libavcodec accesses the vdpau device.
The actual impact was that with libmpv + opengl-cb use, hardware
decoding was permanently broken after display mode switching (something
that caused the display to get preempted at least with older drivers).
With mpv CLI, you can for example enable hw decoding during playback,
then disable it, VT switch to console, switch back to X, and try to
enable hw decoding again.
This is mostly because libav* does not deal with preemption, and NVIDIA
driver preemption behavior being horrible garbage. In addition to being
misdesigned API, the preemption callback is not called before you try to
access vdpau API, and then only with _some_ accesses.
In summary, the preemption callback was never called, neither before nor
after libavcodec tried to init the decoder. So we have to get
mp_vdpau_handle_preemption() called before libavcodec accesses it. This
in turn will do a dummy API access which usually triggers the preemption
callback immediately (with NVIDIA's drivers).
In addition, we have to update the AVHWDeviceContext's device. In theory
it could change (in practice it usually seems to use handle "0").
Creating a new device would cause chaos, as we don't have a concept of
switching the device context on the fly. So we simply update it
directly. I'm fairly sure this violates the libav* API, but it's the
best we can do.
Until now, the texture pointer was stored in plane 1, and the texture
array index was in plane 2. Move this down to plane 0 and plane 1. This
is to align it to the new WIP D3D11 decoding API in Libav, where we
decided that there is no reason to avoid setting plane 0, and that it
would be less weird to start at plane 0.
Sigh...
The functionality is not actually needed for vdpau, but if the vdpau
hwaccel is present, the FFmpeg version is new enough that it includes
the field.
These decoders can select the decoding device with hw_device_ctx, but
don't use hw_frames_ctx (at least not in a meaningful way).
Currently unused, but intended to be used for cuvid, as soon as it hits
ffmpeg git master.
Also make the vdpau and vaapi hwaccel definition structs static, as we
have removed the old code which would have had clashing external
declarations.
This drops support for the old libavcodec APIs. Now FFmpeg 3.3 or FFmpeg
git is required. Libav has no release with the new APIs yet, so for
Libav git as of a few weeks or months ago or so is required if you want
to use Libav.
Not much actually changes in hwdec_vaegl.c - some code is removed, but
the reindentation inflates the diff.
If vaapi was found, but neither the old or new libavcodec vaapi hwaccel
API, then HAVE_VAAPI_HWACCEL will be defined, but not _OLD or _NEW. The
HAVE_VAAPI_HWACCEL define pretty much exists only for acrobatics with
our own waf dependency checker helper code.
The new API works like the new vaapi API, using generic hwaccel support.
One minor detail is the error message that will be printed if using
non-4:2:0 surfaces (which as far as I can tell is completely broken in
the nVidia drivers and thus not supported by mpv). The HEVC warning
(which is completely broken in the nVidia drivers but should work with
Mesa) had to be added to the generic hwaccel code.
This also trashes display preemption recovery. Fuck that. It never
really worked. If someone complains, I might attempt to add it back
somehow.
This is the 4th iteration of the libavcodec vdpau API (after the
separate decoder API, the manual hwaccel API, and the automatic vdpau
hwaccel API). Fortunately, further iterations will be generic, and not
require much vdpau-specific changes (if any at all).
I guess that's the full extent I still care about nVidia's broken
garbage. In theory, we could always force the video mixer (which is the
only method getting the video data that works), but why bother.
FFmpeg could crash with vaapi (new) and --vo=opengl + interpolation.
It seems the actual surface count the old vaapi code uses (and which
usually never exceeded the preallocated amount) was higher than what
was used for the new vaapi code, so just correct that. The d3d helpers
also had weird code that bumped the real pool size, fix them as well.
Why this would result in an assertion failure instead of a proper error,
who knows.
hw_vaapi.c didn't do much interesting anymore. Other than the function
to create a device for decoding with vaapi-copy, everything can be done
by generic code. Other libavcodec hwaccels are planned to provide the
same API as vaapi. It will be possible to drop the other hw_ files in
the future. They will use this generic code instead.
Originally, there was probably some sort of intention to restrict it to
formats supported by the interop, or something. But in the end it was
overcomplicated nonsense.
In the future, we could use mp_hwdec_ctx.supported_formats or other
mechanisms to handle this in a better way.
mp_hwdec_ctx.ctx is not set to a dummy pointer - hwdec_devices_load() is
only used to detect whether to vo_opengl interop is present, and the
common hwdec code expects that the .ctx field is not NULL.
This also changes videotoolbox-copy to use --videotoolbox-format,
instead of the FFmpeg-set default.
The code for copying a videotoolbox surface to mp_image was duplicated
(with some minor differences - I picked the hw_videotoolbox.c version,
because it was "better"). mp_imgfmt_from_cvpixelformat() is somewhat
duplicated with the vt_formats[] table, but this will be fixed in a
later commit, and moving the function to shared code is preparation.
This should allow us to create the device in situations when
Direct3DCreate9 normally fails, for example if no user is logged in.
While the later use-case is not very interesting, I hope it to work in
some other situations as well, for example while certain drivers are in
exclusive full screen mode.
This is available since Windows 7, so I'm removing the old call
completely.
The lock was disabled recently. This commit gets rid of the dummied out
calls. The main reason for removing it is that there is no apparent need
for it anymore, and the new FFmpeg vaapi code does not use or provide
such a lock (there are some places which we cannot control and which do
vaapi API calls, like frame destructors).
Apparently this is the maximum that can be preserved. There is also
something about the decoder being able only to use 3 frames at a time,
and I'm assuming these are part of the 8 frames.
This can be useful in other contexts.
Note that we end up setting AVCodecContext.width/height instead of
coded_width/coded_height now. AVCodecParameters can't set coded_width,
but this is probably more correct anyway.
The FFmpeg versions we support all have the APIs we were checking for.
Only Libav missed them. Simplify this by explicitly checking for FFmpeg
in the code, instead of trying to detect the presence of the API.
Tried to decode a High 4:2:2 file, since libavcodec code seemed to
indicate that it's supported. Well, it decodes to garbage.
I couldn't find out why ffmpeg.c actually appears to reject this
correctly. The API seems to be fine with, just that the output is
garbage.
Add a hack for now.
Successful decoding of a frame resets ctx->hwdec_fail_count to 0 - which
us ok, but prevents fallback if it fails if --vd-lavc-software-fallback
is set to something higher than 1.
Just fail it immediately, since failing here always indicates some real
error (or OOM), not e.g. a video parsing error or such, which we try to
tolerate via the error counter.
Use the libavutil vdpau frame allocation code instead of our own "old"
code. This also uses its code for copying a video surface to normal
memory (used by vdpau-copy).
Since vdpau doesn't really have an internal pixel format, 4:2:0 can be
accessed as both nv12 and yuv420p - and libavutil prefers to report
yuv420p. The OpenGL interop has to be adjusted accordingly.
Preemption is a potential problem, but it doesn't break it more than it
already is.
This requires a bug fix to FFmpeg's vdpau code, or vdpau-copy (as well
as taking screenshots) will fail. Libav has fixed this bug ages ago.
In a way it can be reused. For now, sw_format and initial_pool_size
determination are still vaapi-specific. I'm hoping this can be eventally
moved to libavcodec in some way. Checking the supported_formats array is
not really vaapi-specific, and could be moved to the generic code path
too, but for now it would make things more complex.
hw_cuda.c can't use this, but hw_vdpau.c will in the following commit.
If hardware decoding is enabled (via --hwdec anything), the player was
printing an informational message that software decdoing is used.
Basically, this confuses users, because they think there is a problem or
such. Just disable the message, it's semi-useless anyway.
This was suggested on IRC, after yet another user was asking why this
message was shown (with a follow up discussion which CPUs can decode
what kind of video codecs).
There are going to be users who have a Mesa installation which do not
support 10 bit, but a GPU which can decode to 10 bit. So it's probably
better not to hardcode whether it is supported.
Introduce a more general way to signal supported formats from renderer
to decoder. Obviously this is imperfect, because it still isn't part of
proper format negotation (for example, what if there's a vavpp filter,
which accepts anything). Still slightly better than before.
I don't know any way to probe for vaapi dmabuf/EGL dmabuf support
properly (in particular testing specific formats, not just general
availability). So we stay with the current approach and try to create
and map dummy surfaces on init to probe for support. Overdo it and check
all formats that AVHWFramesConstraints reports, instead of only NV12 and
P010 surfaces.
Since we can support unknown formats now, add explicitly checks to the
EGL/dmabuf mapper code to reject unsupported formats. I also noticed
that libavutil signals support for RGB0/BGR0, but couldn't get it to
work. Remove the DRM formats that are unused/didn't work the way I tried
to use them.
With this, 10 bit decoding + rendering should work, provided you have
a capable CPU and a patched Mesa. The required Mesa patch adds support
for the R16 and GR32 formats. It was sent by a Kodi developer to the
Mesa developer mailing list and was not accepted yet.
For convenience. Since we still have code that works even if creating a
AVHWDeviceContext fails, failure is ignored. (Although currently, it
succeeds creation even with the stale/abandoned vdpau wrapper driver.)
Rendering support in Mesa probably doesn't exist yet. In theory it might
be possible to use VPP to convert the surfaces to 8 bit (like we do it
with dxva2/d3d11va as ANGLE doesn't support rendering 10 bit surface
either), but that too would require explicit mechanisms. This can't be
implemented either until I have a GPU with actual support.
Other hwdecs will also be able to use this (as soon as they are switched
to use AVHWFramesContext).
As an additional feature, failing to copy back the frame counts as
hardware decoding failure and can trigger fallback. This can be done
easily now, because it needs no way to communicate this from the hwaccel
glue code to the common code.
The old code is still required for the old decode API, until we either
drop or rewrite it. vo_vaapi.c's OSD code (fuck...) also uses these
surface functions to a higher degree.
Mostly affects conversion of the colorimetric parameters.
Not changing AV_FRAME_DATA_MASTERING_DISPLAY_METADATA handling - that's
too messy, as decoders typically output it for keyframes only, and would
require weird caching that can't even be done on the level of the frame
rewrapping functions.
This fixes direct rendering with hwdec_vaegl.c.
The code duplication between update_image_params() and
mp_image_copy_fields_from_av_frame() is quite annoying,
bit will have to be resolved in another commit.
AVHWDeviceContext.user_opaque is reserved to libavutil under certain
circumstances, while AVHWFramesContext.user_opaque is truly free for use
by us. It's slightly simpler too.
The old API is deprecated, and libavcodec prints a warning at runtime.
The new API is a bit nicer and does many things for you, such as
managing the underlying hwaccel decoder. libavutil also provides code
for managing surfaces (we use their surface pool).
The new code does not contain any code from the original MPlayer VAAPI
patch (that was used as base for some of the vaapi code in mpv). Thus
the new code is LGPL.
The new API actually does not add any visible symbols, so the only way
to detect it is a version check. Of course, the versions overlap
between FFmpeg and Libav, which requires additional care. The new
API did not get merged into FFmpeg yet, so there's no check for
FFmpeg.
This is simpler and more robust, especially for the hwdec fallback case.
The most annoying issue is that C doesn't support multiple return values
(or sum types), so the decode call gets all awkward.
The hwdec fallback case does not need to try to produce some output
after the fallback anymore. Instead, it can use the normal "replay"
code path.
We invert the "eof" bool that vd_lavc.c used internally. The
receive_frame decoder API returns the inverse of EOF, because
returning "true" from the decode function if EOF was reached
feels awkward.
Usually they happen at the same time, but conflating them is still a bit
unclean and could possibly cause problems in the future. It's also
really unnecessary.
The ffmpeg cuda wrappers need more than 1 packet for determining whether
hw decoding will work. So do something complicated and keep up to 32
packets when trying to do hw decoding, and replay the packets on the
software decoder if it doesn't work.
This code was written in a delirious state, testing for regressions and
determining whether this is worth the trouble will follow later. All mpv
git users are alpha testers as of this moment.
Fixes#3914.
Cover art handling is a disgusting hack that causes a mess in all
components. And this will stay this way. This is the Xth time I've
changed cover art handling, and that will probably also continue.
But change the code such that cover art is injected into the demux
packet stream, instead of having an explicit special case it in the
decoder glue code. (This is somewhat more similar to the cover art hack
in libavformat.)
To avoid that the over art picture is decoded again on each seek, we
need some additional "caching" in player/video.c. Decoding it after each
seek would work as well, but since cover art pictures can be pretty
huge, it's probably ok to invest some lines of code into caching it.
One weird thing is that the cover art packet will remain queued after
seeks, but that is probably not an issue.
In exchange, we can drop the dec_video.c code, which is pretty
convenient for one of the following commits. This code duplicates a
bunch of lower-level decode calls and does icky messing with this weird
state stuff, so I'm glad it goes away.
Conceptually cleaner, although the API claims this is equivalent.
Originally, AVCodecContext fields were used, because not all supported
libavcodec/libavutil versions had the AVFrame fields.
This is not done for chroma_sample_location - it has no AVFrame field.
Helps with gif, probably does unwanted things with other formats.
This doesn't handle --end quite correctly, but this could be added
later.
Fixes#3924.
Needs explicit logic. Fixes a pretty bad regression which prefers
vdpau-copy over native vaapi with direct rendering (with --hwdec=auto)
if libvdpau-va-gl1 is present. The reason is that vdpau-copy is above
vaapi, simply because all vdpau hwdecs are grouped and happened to be
listed before vaapi.
Although this is not that bad for copy-mode (unlike the case described
above), it's still a good idea to use our native vaapi code instead.
The latest 375.xx nvidia drivers add support for P016 output
surfaces. In combination with an ffmpeg change to return those
surfaces, we can display them.
The bulk of the work is related to knowing which format you're
dealing with at the right time. Once you know, it's straight forward.
At least with Nvidia drivers, some thread tries to access D3D11 objects
after ANGLE unloads d3d11.dll. Fix this by holding a reference to
d3d11.dll ourselves.
Might fix the crash in #3348. (I wish I knew why though.)
This is a bit unintuitiv, but it appears hwdec backends have to unset
hwdec_priv manually in their uninit function. Normally with this idiom
you'd expect the common code to do this (and maybe even freeing the priv
struct). Since other hwdec backends do this quite consistently, just fix
vdpau for now.
Also add an assert to detect similar bugs sooner.
Fixes#3788.
Implementation-wise, the values from the demuxer/codec header are merged
with the values from the decoder such that the former are used only
where the latter are unknown (0/auto).
At this point, all other hwaccels provide -copy modes, and vdpau is the
exception with not having one. Although there is vf_vdpaurb, it's less
convenient in certain situations, and exposes some issues with the
filter chain code as well.
Both AVFrame.pts and AVFrame.pkt_pts have existed for a long time. Until
now, decoders always returned the pts via the pkt_pts field, while the
pts field was used for encoding and libavfilter only. Recently, pkt_pts
was deprecated, and pts was switched to always carry the pts.
This means we have to be careful not to accidentally use the wrong
field, depending on the libavcodec version. We have to explicitly check
the version numbers. Of course the version numbers are completely
idiotic, because idiotically the pkg-config and library names are the
same for FFmpeg and Libav, so we have to deal with this explicitly as
well.
When the vaapi decoder is used in copy mode, it creates a dummy
display to render to. In theory, this should support hardware
decoding on on a separate GPU that is not actually connected to
any output (like an iGPU which supports more formats than the
external GPU to which the monitor is connected).
However, before this change, only X11 displays were supported as
dummy displays. This caused some graphics drivers (namely
intel-driver) to core dump when they were not actually used as X11
module.
This change introduces support for drm libav displays, which
allows vaapi-copy to run on such cards which are not actually
rendering the X11 output.
We always want to use __declspec(selectany) to declare GUIDs, but
manually including <initguid.h> in every file that used GUIDs was
error-prone. Since all <initguid.h> does is define INITGUID and include
<guiddef.h>, we can remove all references to <initguid.h> and just
compile with -DINITGUID to get the same effect.
Also, this partially reverts 622bcb0 by re-adding libuuid.a to the
build, since apparently some GUIDs (such as GUID_NULL) are not declared
in the source file, even when INITGUID is set.
This is the actual decoder output, with no overrides applied. (Maybe
video-params shouldn't contain the overrides in the first place, but
damage done.)
This really shouldn't be in vd_lavc.c - move it to dec_video.c, where it
also applies aspect overrides. This makes all overrides in one place.
The previous commit contains some required changes for resetting the
image parameters change detection (i.e. it's not done only on video
aspect override changes).
Use the new mechanism, instead of wrapped properties. As usual, extend
the update handling to some options that were forgotten/neglected
before. Rename video_reset_aspect() to video_reset_params() to make it
more "general" (and we can amazingly include write access to
video-aspect as well in this).
'cuda-gl' isn't right - you can turn this on without any GL and
get some non-zero benefit (with the cuda-copy hwaccel). So
'cuda-hwaccel' seems more consistent with everything else.
The cuvid decoder already knows how to copy back to system memory
if NV12 frames are requested, and this will happen if the decoder
is used without the hwdec.
For convenience, let's add a wrapper hwdec so people don't have
to explicitly pick the cuvid decoder if they want this behaviour.
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
These are different AVCodecContext fields. pkt_timebase is the correct
one for identifying the unit of packet/frame timestamps when decoding,
while time_base is for encoding. Some decoders also overwrite the
time_base field with some unrelated codec metadata.
pkt_timebase does not exist in Libav, so an #if is required.
Instead of passing through double float timestamps opaquely, pass real
timestamps. Do so by always setting a valid timebase on the
AVCodecContext for audio and video decoding.
Specifically try not to round timestamps to a too coarse timebase, which
could round off small adjustments to timestamps (such as for start time
rebasing or demux_timeline). If the timebase is considered too coarse,
make it finer.
This gets rid of the need to do this specifically for some hardware
decoding wrapper. The old method of passing through double timestamps
was also a bit questionable. While libavcodec is not supposed to
interpret timestamps at all if no timebase is provided, it was
needlessly tricky. Also, it actually does compare them with
AV_NOPTS_VALUE. This change will probably also reduce confusion in the
future.
The hw_subfmt field roughly corresponds to the field
AVHWFramesContext.sw_format in ffmpeg. The ffmpeg one is of the type
AVPixelFormat (instead of the underlying hardware format), so it's a
good idea to switch to this too for preparation.
Now the hw_subfmt field is an mp_imgfmt instead of an opaque/API-
specific number. VDPAU and Direct3D11 already used mp_imgfmt, but
Videotoolbox and VAAPI had to be switched.
One somewhat user-visible change is that the verbose log will now always
show the hw_subfmt as image format, instead of as nonsensical number.
(In the end it would be good if we could switch to AVHWFramesContext
completely, but the upstream API is incomplete and doesn't cover
Direct3D11 and Videotoolbox.)
This greatly improves the result when decoding typical (ST.2084) HDR
content, since the job of tone mapping gets significantly easier when
you're only mapping from 1000 to 250, rather than 10000 to 250.
The difference is so drastic that we can now even reasonably use
`hdr-tone-mapping=linear` and get a very perceptually uniform result
that is only slightly darker than normal. (To compensate for the extra
dynamic range)
Due to weird implementation details, this only seems to be present on
keyframes (or something like that), so we have to cache the last seen
value for the frames in between.
Also, in some files the metadata is just completely broken /
nonsensical, so I decided to apply a simple heuristic to detect
completely broken metadata.
This involves multiple changes:
1. Brightness metadata is split into nominal peak and signal peak.
For a quick and dirty explanation: nominal peak is the brightest value
that your color space can represent (i.e. the brightness of an encoded
1.0), and signal peak is the brightest value that actually occurs in
the video (i.e. the brightest thing that's displayed).
2. vo_opengl uses a new decision logic to figure out the right nom_peak
and sig_peak for all situations. It also does a better job of picking
the right target gamut/colorspace to use for the OSD. (Which still is
and still should be treated as sRGB). This change in logic also
fixes#3293 en passant.
3. Since it was growing rapidly, the logic for auto-guessing / inferring
the right colorimetry configuration (in pass_colormanage) was split from
the logic for actually performing the adaptation (now pass_color_map).
Right now, the new logic doesn't do a whole lot since HDR metadata is
still ignored (but not for long).
This has two reasons:
1. I tend to add new fields to this metadata, and every time I've done
so I've consistently forgotten to update all of the dozens of places in
which this colorimetry metadata might end up getting used. While most
usages don't really care about most of the metadata, sometimes the
intend was simply to “copy” the colorimetry metadata from one struct to
another. With this being inside a substruct, those lines of code can now
simply read a.color = b.color without having to care about added or
removed fields.
2. It makes the type definitions nicer for upcoming refactors.
In going through all of the usages, I also expanded a few where I felt
that omitting the “young” fields was a bug.
No method of taking a screenshot was implemented at all. vo_opengl
lacked window screenshotting, because ANGLE doesn't allow reading the
frontbuffer. There was no way to read back from a D3D11 texture either.
Implement reading image data from D3D11 textures. This is a low-quality
effort to get basic screenshots done. Eventually there will be a better
implementation: once we use AVHWFramesContext natively, the readback
implementation will be in libavcodec, and will be able to cache the
staging texture correctly. Hopefully. (For now it doesn't even have a
AVHWFramesContext for D3D11 yet. But the abstraction is more appropriate
for this purpose.)
OK, this was dumb. The file didn't have much to do with ANGLE, and the
functionality can simply be moved to d3d.c. That file contains helpers
for decoding, but can always be present (on Windows) since it doesn't
access any D3D specific libavcodec APIs. Thus it doesn't need to be
conditionally built like the actual hwaccel wrappers.
Until now, we've always converted vdpau video surfaces to RGB, and then
mapped the resulting RGB texture. Change this so that the surface is
mapped as NV12 plane textures.
The reason this wasn't done until now is because vdpau surfaces are
mapped in an "interlaced" way as separate fields, even for progressive
video. This requires messy reinterleraving. It turns out that even
though it's an extra processing step, the result can be faster than
going through the video mixer for RGB conversion.
Other than some potential speed-gain, doing this has multiple other
advantages. We can apply our own color conversion, which is important in
more complex cases. We can correctly apply debanding and potentially
other processing that requires chroma-specific or in-YUV handling.
If deinterlacing is enabled, this switches back to the old RGB
conversion method. Until we have at least a primitive deinterlacer in
vo_opengl, this will stay this way. The d3d11 and vaapi code paths are
similar. (Of course these don't require any crazy field reinterleaving.)
We now have a video filter that uses the d3d11 video processor, so it
makes no sense to have one in the VO interop code. The VO uses it for
formats not directly supported by ANGLE (so the video data is converted
to a RGB texture, which ANGLE can take in).
Change this so that the video filter is automatically inserted if
needed. Move the code that maps RGB surfaces to its own inteorp backend.
Add a bunch of new image formats, which are used to enforce the new
constraints, and to automatically insert the filter only when needed.
The added vf mechanism to auto-insert the d3d11vpp filter is very dumb
and primitive, and will work only for this specific purpose. The format
negotiation mechanism in the filter chain is generally not very pretty,
and mostly broken as well. (libavfilter has a different mechanism, and
these mechanisms don't match well, so vf_lavfi uses some sort of hack.
It only works because hwaccel and non-hwaccel formats are strictly
separated.)
The RGB interop is now only used with older ANGLE versions. The only
reason I'm keeping it is because it's relatively isolated (uses only
existing mechanisms and adds no new concepts), and because I want to be
able to compare the behavior of the old code with the new one for
testing. It will be removed eventually.
If ANGLE has NV12 interop, P010 is now handled by converting to NV12
with the video processor, instead of converting it to RGB and using the
old mechanism to import that as a texture.
For some reason, the d3d9/dxva2/d3d11 DLLs are still optional. But we
don't need to try so hard to keep exact references. In fact, there's no
reason to unload them at all.
So load them once in a central place. For simplicity, the d3d9/d3d11
backends both load all DLLs. (They will error out only if the required
DLLs could not be loaded.)
In theory, we could just call LoadLibrary multiple times (without
calling FreeLibrary), but I'm slightly worried that this could be
detected as a "bug", or that the reference count could even have a low
static limit that could be hit soon.
This uses the normal autoprobing rules like "auto", but rejects anything
that isn't flagged as copying data back to system memory.
The chunk in command.c was dead code, so remove it instead of updating
it.
We don't have any reason to disable either. Both are loaded dynamically
at runtime anyway. There is also no reason why dxva2 would disappear
from libavcodec any time soon.
This uses EGL_ANGLE_stream_producer_d3d_texture_nv12 and related
extensions to map the D3D textures coming from the hardware decoder
directly in GL.
In theory this would be trivial to achieve, but unfortunately ANGLE does
not have a mechanism to "import" D3D textures as GL textures. Instead,
an awkward mechanism via EGL_KHR_stream was implemented, which involves
at least 5 extensions and a lot of glue code. (Even worse than VAAPI EGL
interop, and very far from the simplicity you get on OSX.)
The ANGLE mechanism so far supports only the NV12 texture format, which
means 10 bit won't work. It also does not work in ES3 mode yet. For
these reasons, the "old" ID3D11VideoProcessor code is kept and used as a
fallback.
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.
The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)
This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.
Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.
The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.
This is a big refactor. Some things may be untested and could be broken.
Including initguid.h at the top of a file that uses references to GUIDs
causes the GUIDs to be declared globally with __declspec(selectany). The
'selectany' attribute tells the linker to consolidate multiple
definitions of each GUID, which would be great except that, in Cygwin
and MinGW GCC 6.1, this method of linking makes the GUIDs conflict with
the ones declared in libuuid.a.
Since initguid.h obsoletes libuuid.a in modern compilers that support
__declspec(selectany), add initguid.h to all files that use GUIDs and
remove libuuid.a from the build.
Fixes#3097
In particular, this moves the depth test to common code.
Should be functionally equivalent, except that for DXVA2, the
IDirectXVideoDecoderService_GetDecoderRenderTargets API is called
more often potentially.
Basically this gets rid of the need for the accessors in d3d11va.h, and
the code can be cleaned up a little bit.
Note that libavcodec only defines a ID3D11VideoDecoderOutputView pointer
in the last plane pointers, but it tolerates/passes through the other
plane pointers we set.
This uses ID3D11VideoProcessor to convert the video to a RGBA surface,
which is then bound to ANGLE. Currently ANGLE does not provide any way
to bind nv12 surfaces directly, so this will have to do.
ID3D11VideoContext1 would give us slightly more control about the
colorspace conversion, though it's still not good, and not available
in MinGW headers yet.
The video processor is created lazily, because we need to have the coded
frame size, of which AVFrame and mp_image have no concept of. Doing the
creation lazily is less of a pain than somehow hacking the coded frame
size into mp_image.
I'm not really sure how ID3D11VideoProcessorInputView is supposed to
work. We recreate it on every frame, which is simple and hopefully
doesn't affect performance.
For Mediacodec in particular we don't care about the format. It can just
decode to whatever it wants. The only case we would care about is it not
returning an opaque format if we don't have proper interop, but
libavcodec always returns non-opaque formats by default.
Use the recently added lavc_suffix mechanism to select the wrapper
decoder.
With all hwdec callbacks being optional, and RPI/Mediacodec having only
dummy callbacks, all the callbacks can be removed as well.
The result is that the vd_lavc_hwdec struct for both of them is tiny.
It's better to move them to vd_lavc.c directly, because they are so
trivial and small.
This is intended for cases when --hwdec needs to override the decoder
implementation in use, like for example on the RPI.
It does two things:
1. Allow the hwdec to indicate a decoder suffix. libavcodec by
convention adds a suffix to all wrapper decoders, and here we start
relying on it. While not necessarily the best idea, it's the only
thing we got. libavcodec's hwaccel list is useless, because it only
has the codec ID, not the associated decoder's name.
2. Make --hwdec=auto work properly. It shouldn't fail anymore, and hwdec
probing should reliably work, even if a different decoder is selected
with --vd. The semantics of --hwdec should dictate that it overrides
the default decoder.
Until now, we have made the assumption that a driver will use only 1
hardware surface format. the format is dictated by the driver (you
don't create surfaces with a specific format - you just pass a
rt_format and get a surface that will be in a specific driver-chosen
format).
In particular, the renderer created a dummy surface to probe the format,
and hoped the decoder would produce the same format. Due to a driver
bug this required a workaround to actually get the same format as the
driver did.
Change this so that the format is determined in the decoder. The format
is then passed down as hw_subfmt, which allows the renderer to configure
itself with the correct format. If the hardware surface changes its
format midstream, the renderer can be reconfigured using the normal
mechanisms.
This calls va_surface_init_subformat() each time after the decoder
returns a surface. Since libavcodec/AVFrame has no concept of sub-
formats, this is unavoidable. It creates and destroys a derived
VAImage, but this shouldn't have any bad performance effects (at
least I didn't notice any measurable effects).
Note that vaDeriveImage() failures are silently ignored as some
drivers (the vdpau wrapper) support neither vaDeriveImage, nor EGL
interop. In addition, we still probe whether we can map an image
in the EGL interop code. This is important as it's the only way
to determine whether EGL interop is supported at all. With respect
to the driver bug mentioned above, it doesn't matter which format
the test surface has.
In vf_vavpp, also remove the rt_format guessing business. I think the
existing logic was a bit meaningless anyway. It's not even a given
that vavpp produces the same rt_format for output.
The underlying intention of this code is to make changing
--videotoolbox-format at runtime work. For this reason, the format can't
just be statically setup, but must be read from the option at runtime.
This means the format is not fixed anymore, and we have to make sure the
renderer is property reinitialized if the format changes. There is
currently no way to trigger reinit on this level, which is why the
mp_image_params.hw_subfmt field was introduced.
One sketchy thing remains: normally, the renderer is supposed to be
involved with VO format negotiation, which would ensure that the VO
can take the format at all. Since the hw_subfmt is not part of this
format negotiation, it's implied the get_vt_fmt() callback only
returns formats supported by the renderer. This is not necessarily
clear because vo_opengl checks this with converted_imgfmt separately.
None of this matters in practice though, because we know all formats
are always supported.
(This still requires somehow triggering decoder reinit to make the
change effective.)
Until now, the presence of the process_image() callback was used to set
a delay queue with a hardcoded size. Change this to a vd_lavc_hwdec
field instead, so the decoder can explicitly set this if it's really
needed.
Do this so process_image() can be used in the VideoToolbox glue code for
something entirely unrelated.
Some functions which expected a codec name (i.e. the name of the video
format itself) were passed a decoder name. Most "native" libavcodec
decoders have the same name as the codec, so this was never an issue.
This should mean that e.g. using "--vd=lavc:h264_mmal --hwdec=mmal"
should now actually enable native surface mode (instead of doing copy-
back).
AVFormatContext.codec is deprecated now, and you're supposed to use
AVFormatContext.codecpar instead.
Handle this for all of the normal playback code.
Encoding mode isn't touched.
This commit adds the d3d11va-copy hwdec mode using the ffmpeg d3d11va
api. Functions in common with dxva2 are handled in a separate decode/d3d.c
file. A future commit will rewrite decode/dxva2.c to share this code.
The mp_set_av_packet()/mp_pts_from_av() functions check whether the
timebase is set at all (i.e. AVRational.num!=0), so there's no need to
fiddle with pointers.
Instead of displaying it only on playback start (or after switching
tracks), always display it even after a seek.
This helps with --lavfi-complex. You can now overlay e.g. audio
visualizations over cover art, and it won't break after a seek.
The downside is that this might make seeks with huge cover art slower.
There is also a glitch on seeking: since cover art pictures always
have timestamp 0, the playback time will be 0 for a moment after seek,
and then revert to audio PTS (as video is considered EOF). This is also
due to how lavfi's overlay filter behaves. (I'm not sure how to tell
lavfi that it's just a single frame.)
Deselecting cover art and then reselecting it did not work. The second
time the cover art picture is not displayed again. (This seems to break
every other month...)
The reason is commit 6640b22a. It mutates the input packet. And it is
correct that we don't own d_video->header->attached_picture at this
point. Fix it by creating a new packet reference.
Completely pointless abominations that FFmpeg refuses to remove. They
are ancient, long deprecated API which we can't use anymore. They
confused users as well.
Pretend that they don't exist. Due to the way --vd works, they can't
even be forced anymore. The older hack which explicitly rejects these
can be dropped as well.
Hr-seek was often off by one frame due to rounding issues, which have
been traditionally taken care off by adding a "tolerance". Essentially,
frames very close to the seek target PTS are not dropped, even if they
may strictly are before the seek target.
Commit 0af53353 accidentally removed this by always removing frames even
if they're within the "tolerance". Fix this by "unsharing" the logic and
making sure the segment code is inactive for normal seeks.
Doing --hwdec=auto ends up picking dxva2, creating a decoder, and then
sending D3D frames down the video chain, which immediately fails and
falls back to software.
Consider dxva2 only if the VO provides a context. If this fails,
autoprobing will proceed to try dxva2-copy as usual.
Fixes#2844.
This is in preparation for a hypothetical API change in libavcodec,
which would allow the decoder to return multiple video frames before
accepting a new input packet.
In theory, the body of the if() added to vd_lavc.c could be replaced
with this code:
packet->buffer += ret;
packet->len -= ret;
but currently this is not needed, as libavformat already outputs one
frame per packet. Also, using libavcodec this way could lead to a
"deadlock" if the decoder refuses to consume e.g. garbage padding, so
enabling this now would introduce bugs.
(Adding this now for easier testing, and for symmetry with the audio
code.)
There is some strange code which sets the DTS of the packet to PTS (but
only if it's not AVI), which apparently helps with timestamp
determination with some broken files. This code is annoying because it
tries to avoid mutating the packet (which it logically doesn't own).
Move it to where it does and get rid of the packet_copy mess.
Needed for the following commit.
This tries to determine whether packet PTS values are accurate and can
be used for frame dropping during seeking. Move both checks (PTS is
missing; PTs is non-monotonic) to the earliest place where they can be
done.
Apparently, some drivers require you to allocate all of the decoder d3d surfaces
at once. This commit changes the strategy from allocating surfaces as needed via
mp_image_pool_set_allocator, to allocating all the surfaces in one call to
IDirectXVideoDecoderService_CreateSurface and adding them to the pool with
mp_image_pool_add.
fixes#2822
This uses a different method to piece segments together. The old
approach basically changes to a new file (with a new start offset) any
time a segment ends. This meant waiting for audio/video end on segment
end, and then changing to the new segment all at once. It had a very
weird impact on the playback core, and some things (like truly gapless
segment transitions, or frame backstepping) just didn't work.
The new approach adds the demux_timeline pseudo-demuxer, which presents
an uniform packet stream from the many segments. This is pretty similar
to how ordered chapters are implemented everywhere else. It also reminds
of the FFmpeg concat pseudo-demuxer.
The "pure" version of this approach doesn't work though. Segments can
actually have different codec configurations (different extradata), and
subtitles are most likely broken too. (Subtitles have multiple corner
cases which break the pure stream-concatenation approach completely.)
To counter this, we do two things:
- Reinit the decoder with each segment. We go as far as allowing
concatenating files with completely different codecs for the sake
of EDL (which also uses the timeline infrastructure). A "lighter"
approach would try to make use of decoder mechanism to update e.g.
the extradata, but that seems fragile.
- Clip decoded data to segment boundaries. This is equivalent to
normal playback core mechanisms like hr-seek, but now the playback
core doesn't need to care about these things.
These two mechanisms are equivalent to what happened in the old
implementation, except they don't happen in the playback core anymore.
In other words, the playback core is completely relieved from timeline
implementation details. (Which honestly is exactly what I'm trying to
do here. I don't think ordered chapter behavior deserves improvement,
even if it's bad - but I want to get it out from the playback core.)
There is code duplication between audio and video decoder common code.
This is awful and could be shareable - but this will happen later.
Note that the audio path has some code to clip audio frames for the
purpose of codec preroll/gapless handling, but it's not shared as
sharing it would cause more pain than it would help.
This is required so that the individual surfaces can pass beyond the dxva2
decoder and be passed to the vo.
This also adds additional data to mp_image->planes[0] for IMGFMT_DXVA2, which is
required for maintaining and releasing the surface even if the decoder code is
uninited.
The IDirectXVideoDecoder itself is encapsulated together with its surface pool
and configuration in a dxva2_decoder structure whose creation and destruction is
managed by talloc.
Until now (and in mplayer traditionally), avi timestamps were handled
with a timestamp FIFO. AVI timestamps are essentially just strictly
increasing frame numbers and are not reordered like normal timestamps.
Limiting the FIFO is required because frames can be dropped. To make
it worse, frame dropping can't be distinguished from the decoder not
returning output due to increasing the buffering required for B-frames.
("Measuring" the buffering at playback start seems like an interesting
idea, but won't work as the buffering could be increased mid-playback.)
Another problem are skipped frames (packets with data, but which do
not contain a video frame).
Besides dropped and skipped frames, there is the problem that we can't
always know the delay. External decoders like MMAL are not going to
tell us. (And later perhaps others, like direct VideoToolbox usage.)
In general, this works not-well enough that I prefer the solution of
passing through AVI timestamps as DTS. This is slightly incorrect,
because most decoders treat DTS as mpeg-style timestamps, which
already include a b-frame delay, and thus will be shifted by a few
frames. This means there will be a problem with A/V sync in some
situations.
Note that the FFmpeg AVI demuxer shifts timestamps by an additional
amount (which increases after the first seek!?!?), which makes the
situation worse. It works well with VfW-muxed Matroska files, though.
On RPI, the first X timestamps are broken until the MMAL decoder "locks
on".
fd339e3f53 introduced a regression that caused
segfault while uniniting dxva2 decoder (and possibly vdpau too). The problem was
that it freed the avctx earlier, before calling the backend-specific uninit
which referenced it.
Revert some of the changes of that commit, and avoid calling flush by
checking whether the codec is open instead.
(Based on a PR by Kevin Mitchell.)
Signed-off-by: wm4 <wm4@nowhere>
It can be "dangerous". In particular, the decoder might have failed to
initialize, and is now in a broken state. avcodec_flush_buffers() is not
expected to be called in this state, and could trigger undefined
behavior.
Avoids "problems". In particular, it makes MMAL output a NOPTS timestamp
if the input timestamp was NOPTS.
Don't do it for other decoders. Ideally, we will at some point in the
future switch to integer fractions for timestamps at least up until the
filter layer. But this would be a larger change, and for now I'd prefer
keeping the not-rounded demuxer timestamps (if we have them).
Will be helpful for the coming filter support. I planned on merging
audio/video decoding, but this will have to wait a bit longer, so only
remove the duplicate status codes.