Usually they happen at the same time, but conflating them is still a bit
unclean and could possibly cause problems in the future. It's also
really unnecessary.
The ffmpeg cuda wrappers need more than 1 packet for determining whether
hw decoding will work. So do something complicated and keep up to 32
packets when trying to do hw decoding, and replay the packets on the
software decoder if it doesn't work.
This code was written in a delirious state, testing for regressions and
determining whether this is worth the trouble will follow later. All mpv
git users are alpha testers as of this moment.
Fixes#3914.
Cover art handling is a disgusting hack that causes a mess in all
components. And this will stay this way. This is the Xth time I've
changed cover art handling, and that will probably also continue.
But change the code such that cover art is injected into the demux
packet stream, instead of having an explicit special case it in the
decoder glue code. (This is somewhat more similar to the cover art hack
in libavformat.)
To avoid that the over art picture is decoded again on each seek, we
need some additional "caching" in player/video.c. Decoding it after each
seek would work as well, but since cover art pictures can be pretty
huge, it's probably ok to invest some lines of code into caching it.
One weird thing is that the cover art packet will remain queued after
seeks, but that is probably not an issue.
In exchange, we can drop the dec_video.c code, which is pretty
convenient for one of the following commits. This code duplicates a
bunch of lower-level decode calls and does icky messing with this weird
state stuff, so I'm glad it goes away.
Conceptually cleaner, although the API claims this is equivalent.
Originally, AVCodecContext fields were used, because not all supported
libavcodec/libavutil versions had the AVFrame fields.
This is not done for chroma_sample_location - it has no AVFrame field.
Helps with gif, probably does unwanted things with other formats.
This doesn't handle --end quite correctly, but this could be added
later.
Fixes#3924.
Needs explicit logic. Fixes a pretty bad regression which prefers
vdpau-copy over native vaapi with direct rendering (with --hwdec=auto)
if libvdpau-va-gl1 is present. The reason is that vdpau-copy is above
vaapi, simply because all vdpau hwdecs are grouped and happened to be
listed before vaapi.
Although this is not that bad for copy-mode (unlike the case described
above), it's still a good idea to use our native vaapi code instead.
The latest 375.xx nvidia drivers add support for P016 output
surfaces. In combination with an ffmpeg change to return those
surfaces, we can display them.
The bulk of the work is related to knowing which format you're
dealing with at the right time. Once you know, it's straight forward.
At least with Nvidia drivers, some thread tries to access D3D11 objects
after ANGLE unloads d3d11.dll. Fix this by holding a reference to
d3d11.dll ourselves.
Might fix the crash in #3348. (I wish I knew why though.)
This is a bit unintuitiv, but it appears hwdec backends have to unset
hwdec_priv manually in their uninit function. Normally with this idiom
you'd expect the common code to do this (and maybe even freeing the priv
struct). Since other hwdec backends do this quite consistently, just fix
vdpau for now.
Also add an assert to detect similar bugs sooner.
Fixes#3788.
Implementation-wise, the values from the demuxer/codec header are merged
with the values from the decoder such that the former are used only
where the latter are unknown (0/auto).
At this point, all other hwaccels provide -copy modes, and vdpau is the
exception with not having one. Although there is vf_vdpaurb, it's less
convenient in certain situations, and exposes some issues with the
filter chain code as well.
Both AVFrame.pts and AVFrame.pkt_pts have existed for a long time. Until
now, decoders always returned the pts via the pkt_pts field, while the
pts field was used for encoding and libavfilter only. Recently, pkt_pts
was deprecated, and pts was switched to always carry the pts.
This means we have to be careful not to accidentally use the wrong
field, depending on the libavcodec version. We have to explicitly check
the version numbers. Of course the version numbers are completely
idiotic, because idiotically the pkg-config and library names are the
same for FFmpeg and Libav, so we have to deal with this explicitly as
well.
When the vaapi decoder is used in copy mode, it creates a dummy
display to render to. In theory, this should support hardware
decoding on on a separate GPU that is not actually connected to
any output (like an iGPU which supports more formats than the
external GPU to which the monitor is connected).
However, before this change, only X11 displays were supported as
dummy displays. This caused some graphics drivers (namely
intel-driver) to core dump when they were not actually used as X11
module.
This change introduces support for drm libav displays, which
allows vaapi-copy to run on such cards which are not actually
rendering the X11 output.
We always want to use __declspec(selectany) to declare GUIDs, but
manually including <initguid.h> in every file that used GUIDs was
error-prone. Since all <initguid.h> does is define INITGUID and include
<guiddef.h>, we can remove all references to <initguid.h> and just
compile with -DINITGUID to get the same effect.
Also, this partially reverts 622bcb0 by re-adding libuuid.a to the
build, since apparently some GUIDs (such as GUID_NULL) are not declared
in the source file, even when INITGUID is set.
This is the actual decoder output, with no overrides applied. (Maybe
video-params shouldn't contain the overrides in the first place, but
damage done.)
This really shouldn't be in vd_lavc.c - move it to dec_video.c, where it
also applies aspect overrides. This makes all overrides in one place.
The previous commit contains some required changes for resetting the
image parameters change detection (i.e. it's not done only on video
aspect override changes).
Use the new mechanism, instead of wrapped properties. As usual, extend
the update handling to some options that were forgotten/neglected
before. Rename video_reset_aspect() to video_reset_params() to make it
more "general" (and we can amazingly include write access to
video-aspect as well in this).
'cuda-gl' isn't right - you can turn this on without any GL and
get some non-zero benefit (with the cuda-copy hwaccel). So
'cuda-hwaccel' seems more consistent with everything else.
The cuvid decoder already knows how to copy back to system memory
if NV12 frames are requested, and this will happen if the decoder
is used without the hwdec.
For convenience, let's add a wrapper hwdec so people don't have
to explicitly pick the cuvid decoder if they want this behaviour.
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
These are different AVCodecContext fields. pkt_timebase is the correct
one for identifying the unit of packet/frame timestamps when decoding,
while time_base is for encoding. Some decoders also overwrite the
time_base field with some unrelated codec metadata.
pkt_timebase does not exist in Libav, so an #if is required.
Instead of passing through double float timestamps opaquely, pass real
timestamps. Do so by always setting a valid timebase on the
AVCodecContext for audio and video decoding.
Specifically try not to round timestamps to a too coarse timebase, which
could round off small adjustments to timestamps (such as for start time
rebasing or demux_timeline). If the timebase is considered too coarse,
make it finer.
This gets rid of the need to do this specifically for some hardware
decoding wrapper. The old method of passing through double timestamps
was also a bit questionable. While libavcodec is not supposed to
interpret timestamps at all if no timebase is provided, it was
needlessly tricky. Also, it actually does compare them with
AV_NOPTS_VALUE. This change will probably also reduce confusion in the
future.
The hw_subfmt field roughly corresponds to the field
AVHWFramesContext.sw_format in ffmpeg. The ffmpeg one is of the type
AVPixelFormat (instead of the underlying hardware format), so it's a
good idea to switch to this too for preparation.
Now the hw_subfmt field is an mp_imgfmt instead of an opaque/API-
specific number. VDPAU and Direct3D11 already used mp_imgfmt, but
Videotoolbox and VAAPI had to be switched.
One somewhat user-visible change is that the verbose log will now always
show the hw_subfmt as image format, instead of as nonsensical number.
(In the end it would be good if we could switch to AVHWFramesContext
completely, but the upstream API is incomplete and doesn't cover
Direct3D11 and Videotoolbox.)
This greatly improves the result when decoding typical (ST.2084) HDR
content, since the job of tone mapping gets significantly easier when
you're only mapping from 1000 to 250, rather than 10000 to 250.
The difference is so drastic that we can now even reasonably use
`hdr-tone-mapping=linear` and get a very perceptually uniform result
that is only slightly darker than normal. (To compensate for the extra
dynamic range)
Due to weird implementation details, this only seems to be present on
keyframes (or something like that), so we have to cache the last seen
value for the frames in between.
Also, in some files the metadata is just completely broken /
nonsensical, so I decided to apply a simple heuristic to detect
completely broken metadata.
This involves multiple changes:
1. Brightness metadata is split into nominal peak and signal peak.
For a quick and dirty explanation: nominal peak is the brightest value
that your color space can represent (i.e. the brightness of an encoded
1.0), and signal peak is the brightest value that actually occurs in
the video (i.e. the brightest thing that's displayed).
2. vo_opengl uses a new decision logic to figure out the right nom_peak
and sig_peak for all situations. It also does a better job of picking
the right target gamut/colorspace to use for the OSD. (Which still is
and still should be treated as sRGB). This change in logic also
fixes#3293 en passant.
3. Since it was growing rapidly, the logic for auto-guessing / inferring
the right colorimetry configuration (in pass_colormanage) was split from
the logic for actually performing the adaptation (now pass_color_map).
Right now, the new logic doesn't do a whole lot since HDR metadata is
still ignored (but not for long).
This has two reasons:
1. I tend to add new fields to this metadata, and every time I've done
so I've consistently forgotten to update all of the dozens of places in
which this colorimetry metadata might end up getting used. While most
usages don't really care about most of the metadata, sometimes the
intend was simply to “copy” the colorimetry metadata from one struct to
another. With this being inside a substruct, those lines of code can now
simply read a.color = b.color without having to care about added or
removed fields.
2. It makes the type definitions nicer for upcoming refactors.
In going through all of the usages, I also expanded a few where I felt
that omitting the “young” fields was a bug.
No method of taking a screenshot was implemented at all. vo_opengl
lacked window screenshotting, because ANGLE doesn't allow reading the
frontbuffer. There was no way to read back from a D3D11 texture either.
Implement reading image data from D3D11 textures. This is a low-quality
effort to get basic screenshots done. Eventually there will be a better
implementation: once we use AVHWFramesContext natively, the readback
implementation will be in libavcodec, and will be able to cache the
staging texture correctly. Hopefully. (For now it doesn't even have a
AVHWFramesContext for D3D11 yet. But the abstraction is more appropriate
for this purpose.)
OK, this was dumb. The file didn't have much to do with ANGLE, and the
functionality can simply be moved to d3d.c. That file contains helpers
for decoding, but can always be present (on Windows) since it doesn't
access any D3D specific libavcodec APIs. Thus it doesn't need to be
conditionally built like the actual hwaccel wrappers.
Until now, we've always converted vdpau video surfaces to RGB, and then
mapped the resulting RGB texture. Change this so that the surface is
mapped as NV12 plane textures.
The reason this wasn't done until now is because vdpau surfaces are
mapped in an "interlaced" way as separate fields, even for progressive
video. This requires messy reinterleraving. It turns out that even
though it's an extra processing step, the result can be faster than
going through the video mixer for RGB conversion.
Other than some potential speed-gain, doing this has multiple other
advantages. We can apply our own color conversion, which is important in
more complex cases. We can correctly apply debanding and potentially
other processing that requires chroma-specific or in-YUV handling.
If deinterlacing is enabled, this switches back to the old RGB
conversion method. Until we have at least a primitive deinterlacer in
vo_opengl, this will stay this way. The d3d11 and vaapi code paths are
similar. (Of course these don't require any crazy field reinterleaving.)