Commit Graph

452 Commits

Author SHA1 Message Date
Philip Langdale 2048ad2b8a hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.

As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.

Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).

ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.

These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.

In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:

mpv --vd=lavc:h264_cuvid foobar.mp4

Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).

To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.

That is what this change implements.

The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.

The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.

The frames are always in NV12 format, at least until 10bit output
supports emerges.

The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.

TODO Items:
* I need to add a download_image function for screenshots. This
  would do the same copy to system memory that the decoder's
  system memory output does.
* There are items to investigate on the ffmpeg side. There appears
  to be a problem with timestamps for some content.

Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.

This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.

Usage:

You will need to specify vo=opengl and hwdec=cuda.

Note that hwdec=auto will probably not work as it will try to use
vdpau first.

mpv --hwdec=cuda --vo=opengl foobar.mp4

If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-08 16:06:12 +02:00
wm4 edbb8f6286 vd_lavc: always force milliseconds for MMAL
This libavcodec wrapper should rescale the API timestamps to whatever
it internally needs, but it doesn't yet. So restore this code.
2016-08-29 13:15:44 +02:00
wm4 0110b738d5 vd_lavc, ad_lavc: set pkt_timebase, not time_base
These are different AVCodecContext fields. pkt_timebase is the correct
one for identifying the unit of packet/frame timestamps when decoding,
while time_base is for encoding. Some decoders also overwrite the
time_base field with some unrelated codec metadata.

pkt_timebase does not exist in Libav, so an #if is required.
2016-08-29 12:46:12 +02:00
wm4 c218d9e960 vd_lavc: minor simplification
The timebase is now always valid.
2016-08-23 12:07:46 +02:00
wm4 e5f61c2bd5 vd_lavc: remove unnecessary initialization
This is already the default value.
2016-08-19 15:00:58 +02:00
wm4 05e4df3f0c video/audio: always provide "proper" timestamps to libavcodec
Instead of passing through double float timestamps opaquely, pass real
timestamps. Do so by always setting a valid timebase on the
AVCodecContext for audio and video decoding.

Specifically try not to round timestamps to a too coarse timebase, which
could round off small adjustments to timestamps (such as for start time
rebasing or demux_timeline). If the timebase is considered too coarse,
make it finer.

This gets rid of the need to do this specifically for some hardware
decoding wrapper. The old method of passing through double timestamps
was also a bit questionable. While libavcodec is not supposed to
interpret timestamps at all if no timebase is provided, it was
needlessly tricky. Also, it actually does compare them with
AV_NOPTS_VALUE. This change will probably also reduce confusion in the
future.
2016-08-19 14:59:30 +02:00
wm4 85488f6892 video: change hw_subfmt meaning
The hw_subfmt field roughly corresponds to the field
AVHWFramesContext.sw_format in ffmpeg. The ffmpeg one is of the type
AVPixelFormat (instead of the underlying hardware format), so it's a
good idea to switch to this too for preparation.

Now the hw_subfmt field is an mp_imgfmt instead of an opaque/API-
specific number. VDPAU and Direct3D11 already used mp_imgfmt, but
Videotoolbox and VAAPI had to be switched.

One somewhat user-visible change is that the verbose log will now always
show the hw_subfmt as image format, instead of as nonsensical number.

(In the end it would be good if we could switch to AVHWFramesContext
completely, but the upstream API is incomplete and doesn't cover
Direct3D11 and Videotoolbox.)
2016-07-15 13:04:17 +02:00
Aman Gupta 588b2f48e5 videotoolbox: add --hwdec=videotoolbox-copy for h/w accelerated decoding with video filters 2016-07-15 01:01:17 +02:00
Niklas Haas 5b6cce2b73 vd_lavc: expose mastering display side data reference peak
This greatly improves the result when decoding typical (ST.2084) HDR
content, since the job of tone mapping gets significantly easier when
you're only mapping from 1000 to 250, rather than 10000 to 250.

The difference is so drastic that we can now even reasonably use
`hdr-tone-mapping=linear` and get a very perceptually uniform result
that is only slightly darker than normal. (To compensate for the extra
dynamic range)

Due to weird implementation details, this only seems to be present on
keyframes (or something like that), so we have to cache the last seen
value for the frames in between.

Also, in some files the metadata is just completely broken /
nonsensical, so I decided to apply a simple heuristic to detect
completely broken metadata.
2016-07-03 19:42:52 +02:00
Niklas Haas 923e3c7b20 vo_opengl: generalize HDR tone mapping mechanism
This involves multiple changes:

1. Brightness metadata is split into nominal peak and signal peak.

For a quick and dirty explanation: nominal peak is the brightest value
that your color space can represent (i.e. the brightness of an encoded
1.0), and signal peak is the brightest value that actually occurs in
the video (i.e. the brightest thing that's displayed).

2. vo_opengl uses a new decision logic to figure out the right nom_peak
and sig_peak for all situations. It also does a better job of picking
the right target gamut/colorspace to use for the OSD. (Which still is
and still should be treated as sRGB). This change in logic also
fixes #3293 en passant.

3. Since it was growing rapidly, the logic for auto-guessing / inferring
the right colorimetry configuration (in pass_colormanage) was split from
the logic for actually performing the adaptation (now pass_color_map).

Right now, the new logic doesn't do a whole lot since HDR metadata is
still ignored (but not for long).
2016-07-03 19:42:52 +02:00
Niklas Haas d81fb97f45 mp_image: split colorimetry metadata into its own struct
This has two reasons:

1. I tend to add new fields to this metadata, and every time I've done
so I've consistently forgotten to update all of the dozens of places in
which this colorimetry metadata might end up getting used. While most
usages don't really care about most of the metadata, sometimes the
intend was simply to “copy” the colorimetry metadata from one struct to
another. With this being inside a substruct, those lines of code can now
simply read a.color = b.color without having to care about added or
removed fields.

2. It makes the type definitions nicer for upcoming refactors.

In going through all of the usages, I also expanded a few where I felt
that omitting the “young” fields was a bug.
2016-07-03 19:42:52 +02:00
Ben Boeckel 76a73a0a5d vd_lavc: hide structs behind platform flags
Otherwise, warnings about them being unused appear.
2016-07-01 19:12:34 -04:00
wm4 9ca1592f3f d3d: implement screenshots for --hwdec=d3d11va
No method of taking a screenshot was implemented at all. vo_opengl
lacked window screenshotting, because ANGLE doesn't allow reading the
frontbuffer. There was no way to read back from a D3D11 texture either.

Implement reading image data from D3D11 textures. This is a low-quality
effort to get basic screenshots done. Eventually there will be a better
implementation: once we use AVHWFramesContext natively, the readback
implementation will be in libavcodec, and will be able to cache the
staging texture correctly. Hopefully. (For now it doesn't even have a
AVHWFramesContext for D3D11 yet. But the abstraction is more appropriate
for this purpose.)
2016-06-28 20:38:53 +02:00
wm4 17c5738cb4 d3d: merge angle_common.h into d3d.h
OK, this was dumb. The file didn't have much to do with ANGLE, and the
functionality can simply be moved to d3d.c. That file contains helpers
for decoding, but can always be present (on Windows) since it doesn't
access any D3D specific libavcodec APIs. Thus it doesn't need to be
conditionally built like the actual hwaccel wrappers.
2016-06-28 20:07:56 +02:00
stepshal c5094206ce Fix misspellings 2016-06-26 13:47:21 +02:00
wm4 7be37337f4 vo_opengl: vdpau interop without RGB conversion
Until now, we've always converted vdpau video surfaces to RGB, and then
mapped the resulting RGB texture. Change this so that the surface is
mapped as NV12 plane textures.

The reason this wasn't done until now is because vdpau surfaces are
mapped in an "interlaced" way as separate fields, even for progressive
video. This requires messy reinterleraving. It turns out that even
though it's an extra processing step, the result can be faster than
going through the video mixer for RGB conversion.

Other than some potential speed-gain, doing this has multiple other
advantages. We can apply our own color conversion, which is important in
more complex cases. We can correctly apply debanding and potentially
other processing that requires chroma-specific or in-YUV handling.

If deinterlacing is enabled, this switches back to the old RGB
conversion method. Until we have at least a primitive deinterlacer in
vo_opengl, this will stay this way. The d3d11 and vaapi code paths are
similar. (Of course these don't require any crazy field reinterleaving.)
2016-06-19 19:58:40 +02:00
wm4 7ecac3ae6f d3d11va: remove unused d3d11va_surface.subindex field
This is now stored within the AVFrame/mp_image.
2016-06-16 18:13:46 +02:00
James Ross-Gowan 88b584656d dxva2: remove dead code in failure case
This was made unnecessary by a02d77b, since the only way the function
could fail was by failing to add a reference to the DirectX DLLs.
2016-06-07 18:53:05 +10:00
wm4 0348cd080f video: remove d3d11 video processor use from OpenGL interop
We now have a video filter that uses the d3d11 video processor, so it
makes no sense to have one in the VO interop code. The VO uses it for
formats not directly supported by ANGLE (so the video data is converted
to a RGB texture, which ANGLE can take in).

Change this so that the video filter is automatically inserted if
needed. Move the code that maps RGB surfaces to its own inteorp backend.
Add a bunch of new image formats, which are used to enforce the new
constraints, and to automatically insert the filter only when needed.

The added vf mechanism to auto-insert the d3d11vpp filter is very dumb
and primitive, and will work only for this specific purpose. The format
negotiation mechanism in the filter chain is generally not very pretty,
and mostly broken as well. (libavfilter has a different mechanism, and
these mechanisms don't match well, so vf_lavfi uses some sort of hack.
It only works because hwaccel and non-hwaccel formats are strictly
separated.)

The RGB interop is now only used with older ANGLE versions. The only
reason I'm keeping it is because it's relatively isolated (uses only
existing mechanisms and adds no new concepts), and because I want to be
able to compare the behavior of the old code with the new one for
testing. It will be removed eventually.

If ANGLE has NV12 interop, P010 is now handled by converting to NV12
with the video processor, instead of converting it to RGB and using the
old mechanism to import that as a texture.
2016-05-29 19:00:55 +02:00
wm4 a02d77ba0d d3d: simplify DLL loading
For some reason, the d3d9/dxva2/d3d11 DLLs are still optional. But we
don't need to try so hard to keep exact references. In fact, there's no
reason to unload them at all.

So load them once in a central place. For simplicity, the d3d9/d3d11
backends both load all DLLs. (They will error out only if the required
DLLs could not be loaded.)

In theory, we could just call LoadLibrary multiple times (without
calling FreeLibrary), but I'm slightly worried that this could be
detected as a "bug", or that the reference count could even have a low
static limit that could be hit soon.
2016-05-17 11:59:54 +02:00
wm4 39b64fb176 video: merge dxva2 source files
video/dxva2.c exported only 2 functions, both used only by
video/decode/dxva2.c.

The same was already done for d3d11.
2016-05-17 11:05:51 +02:00
wm4 32c10956e0 vaapi: avoid forward declaration of variable
Why is everything so horrible.
2016-05-15 18:37:51 +02:00
wm4 70b3561270 video: add --hwdec=auto-copy mode
This uses the normal autoprobing rules like "auto", but rejects anything
that isn't flagged as copying data back to system memory.

The chunk in command.c was dead code, so remove it instead of updating
it.
2016-05-11 16:20:13 +02:00
wm4 fd82e14888 build: merge d3d11va and dxva2 hwaccel checks
We don't have any reason to disable either. Both are loaded dynamically
at runtime anyway. There is also no reason why dxva2 would disappear
from libavcodec any time soon.
2016-05-11 15:40:31 +02:00
wm4 a3d416c3d3 vo_opengl: d3d11egl: native NV12 sampling support
This uses EGL_ANGLE_stream_producer_d3d_texture_nv12 and related
extensions to map the D3D textures coming from the hardware decoder
directly in GL.

In theory this would be trivial to achieve, but unfortunately ANGLE does
not have a mechanism to "import" D3D textures as GL textures. Instead,
an awkward mechanism via EGL_KHR_stream was implemented, which involves
at least 5 extensions and a lot of glue code. (Even worse than VAAPI EGL
interop, and very far from the simplicity you get on OSX.)

The ANGLE mechanism so far supports only the NV12 texture format, which
means 10 bit won't work. It also does not work in ES3 mode yet. For
these reasons, the "old" ID3D11VideoProcessor code is kept and used as a
fallback.
2016-05-10 21:06:34 +02:00
wm4 46fff8d31a video: refactor how VO exports hwdec device handles
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.

The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)

This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.

Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.

The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.

This is a big refactor. Some things may be untested and could be broken.
2016-05-09 20:03:22 +02:00
wm4 5be40f035b d3d: DXVA2_ModeMPEG2_VLD supports all profiles
Fixes hardware decoding of most mpeg2 things.
2016-05-03 15:46:16 +02:00
James Ross-Gowan 622bcb0e37 win32: replace libuuid.a usage with initguid.h
Including initguid.h at the top of a file that uses references to GUIDs
causes the GUIDs to be declared globally with __declspec(selectany). The
'selectany' attribute tells the linker to consolidate multiple
definitions of each GUID, which would be great except that, in Cygwin
and MinGW GCC 6.1, this method of linking makes the GUIDs conflict with
the ones declared in libuuid.a.

Since initguid.h obsoletes libuuid.a in modern compilers that support
__declspec(selectany), add initguid.h to all files that use GUIDs and
remove libuuid.a from the build.

Fixes #3097
2016-05-01 21:10:24 +10:00
Kevin Mitchell 8d51f08010 d3d11va: fix invalid deref on decoder init failure
fixes #3092
2016-04-29 23:36:01 -07:00
wm4 64f9e48bf1 d3d11va, dxva2: return the format struct directly
Slight simplification, IMHO.
2016-04-29 23:30:01 +02:00
wm4 016eab2209 d3d11va, dxva2: simplify decoder selection
In particular, this moves the depth test to common code.

Should be functionally equivalent, except that for DXVA2, the
IDirectXVideoDecoderService_GetDecoderRenderTargets API is called
more often potentially.
2016-04-29 23:24:28 +02:00
wm4 bda111018c video: add IMGFMT_P010 alias
Gets rid of some silliness, and might be useful in the future.
2016-04-29 22:38:54 +02:00
wm4 dff33893f2 d3d11va: store texture/subindex in IMGFMT_D3D11VA plane pointers
Basically this gets rid of the need for the accessors in d3d11va.h, and
the code can be cleaned up a little bit.

Note that libavcodec only defines a ID3D11VideoDecoderOutputView pointer
in the last plane pointers, but it tolerates/passes through the other
plane pointers we set.
2016-04-27 14:06:50 +02:00
wm4 9896994688 vd_lavc: adjust D3D11VA autoprobe order
We want to prefer d3d11va over dxva2 anything. But since dxva2 copyback
is more efficient than d3d11va's currently, d3d11va-copy should come
last.
2016-04-27 13:54:20 +02:00
wm4 3706918311 vo_opengl: D3D11VA + ANGLE interop
This uses ID3D11VideoProcessor to convert the video to a RGBA surface,
which is then bound to ANGLE. Currently ANGLE does not provide any way
to bind nv12 surfaces directly, so this will have to do.

ID3D11VideoContext1 would give us slightly more control about the
colorspace conversion, though it's still not good, and not available
in MinGW headers yet.

The video processor is created lazily, because we need to have the coded
frame size, of which AVFrame and mp_image have no concept of. Doing the
creation lazily is less of a pain than somehow hacking the coded frame
size into mp_image.

I'm not really sure how ID3D11VideoProcessorInputView is supposed to
work. We recreate it on every frame, which is simple and hopefully
doesn't affect performance.
2016-04-27 13:49:47 +02:00
wm4 da59726776 vd_lavc: hack against videotoolbox crash on failure
I guess this won't ever be fixed properly in FFmpeg. Too hairy, and the
alternative (using VideoToolbox as "full decoder") is too attractive.
2016-04-26 18:53:58 +02:00
wm4 8ffd2f1dd4 vd_lavc: simplify some unneeded ifdeffery
These were for ancient libavcodec versions.
2016-04-25 19:11:16 +02:00
wm4 6a1814dfb8 vd_lavc: make image_format hwdec field optional
For Mediacodec in particular we don't care about the format. It can just
decode to whatever it wants. The only case we would care about is it not
returning an opaque format if we don't have proper interop, but
libavcodec always returns non-opaque formats by default.
2016-04-25 12:23:38 +02:00
wm4 7e3d8e7134 vd_lavc: simplify RPI and Mediacodec wrappers
Use the recently added lavc_suffix mechanism to select the wrapper
decoder.

With all hwdec callbacks being optional, and RPI/Mediacodec having only
dummy callbacks, all the callbacks can be removed as well.

The result is that the vd_lavc_hwdec struct for both of them is tiny.
It's better to move them to vd_lavc.c directly, because they are so
trivial and small.
2016-04-25 12:23:38 +02:00
wm4 46e49a37be vd_lavc: make all hwdec callbacks optional 2016-04-25 12:23:37 +02:00
wm4 bb17df1f07 vd_lavc: set AVCodecContext.time_base to forced time base
This is a bit sketchy, as there isn't a truly standard way to
communicate the timebase.
2016-04-25 12:23:37 +02:00
wm4 4f5509e1dd vd_lavc: better hwdec wrapper decoder selection
This is intended for cases when --hwdec needs to override the decoder
implementation in use, like for example on the RPI.

It does two things:
1. Allow the hwdec to indicate a decoder suffix. libavcodec by
   convention adds a suffix to all wrapper decoders, and here we start
   relying on it. While not necessarily the best idea, it's the only
   thing we got. libavcodec's hwaccel list is useless, because it only
   has the codec ID, not the associated decoder's name.
2. Make --hwdec=auto work properly. It shouldn't fail anymore, and hwdec
   probing should reliably work, even if a different decoder is selected
   with --vd. The semantics of --hwdec should dictate that it overrides
   the default decoder.
2016-04-25 12:13:12 +02:00
wm4 85416bc36a vd_lavc: allow process_image() to return NULL
In case of errors or whatever.
2016-04-25 11:30:07 +02:00
wm4 b1a8e8dba6 vd_lavc: fix hwdec fallback if hwdec pre-initialization fails
Damn.
2016-04-22 15:52:16 +02:00
Kevin Mitchell ce153bdb42 d3dva: move Intel_H264_NoFGT_ClearVideo to lower priority
This seems to cause problems, so only use it if H264_E is not available.

fixes #3059
2016-04-18 02:00:50 -07:00
Kevin Mitchell 2f5c7af914 dxva2: fix missing newline in error message 2016-04-18 01:58:18 -07:00
Kevin Mitchell 611594e7e8 d3dva: include selected decoder and format in verbose output 2016-04-17 18:28:14 -07:00
wm4 f5ff2656e0 vaapi: determine surface format in decoder, not in renderer
Until now, we have made the assumption that a driver will use only 1
hardware surface format. the format is dictated by the driver (you
don't create surfaces with a specific format - you just pass a
rt_format and get a surface that will be in a specific driver-chosen
format).

In particular, the renderer created a dummy surface to probe the format,
and hoped the decoder would produce the same format. Due to a driver
bug this required a workaround to actually get the same format as the
driver did.

Change this so that the format is determined in the decoder. The format
is then passed down as hw_subfmt, which allows the renderer to configure
itself with the correct format. If the hardware surface changes its
format midstream, the renderer can be reconfigured using the normal
mechanisms.

This calls va_surface_init_subformat() each time after the decoder
returns a surface. Since libavcodec/AVFrame has no concept of sub-
formats, this is unavoidable. It creates and destroys a derived
VAImage, but this shouldn't have any bad performance effects (at
least I didn't notice any measurable effects).

Note that vaDeriveImage() failures are silently ignored as some
drivers (the vdpau wrapper) support neither vaDeriveImage, nor EGL
interop. In addition, we still probe whether we can map an image
in the EGL interop code. This is important as it's the only way
to determine whether EGL interop is supported at all. With respect
to the driver bug mentioned above, it doesn't matter which format
the test surface has.

In vf_vavpp, also remove the rt_format guessing business. I think the
existing logic was a bit meaningless anyway. It's not even a given
that vavpp produces the same rt_format for output.
2016-04-11 22:03:26 +02:00
wm4 813372d6e9 d3d: fix Windows build
Commit f009d16f accidentally broke it.

Thanks to RiCON for noticing and testing.
2016-04-07 21:04:31 +02:00
wm4 f033481551 videotoolbox: change how videotoolbox format is managed
The underlying intention of this code is to make changing
--videotoolbox-format at runtime work. For this reason, the format can't
just be statically setup, but must be read from the option at runtime.

This means the format is not fixed anymore, and we have to make sure the
renderer is property reinitialized if the format changes. There is
currently no way to trigger reinit on this level, which is why the
mp_image_params.hw_subfmt field was introduced.

One sketchy thing remains: normally, the renderer is supposed to be
involved with VO format negotiation, which would ensure that the VO
can take the format at all. Since the hw_subfmt is not part of this
format negotiation, it's implied the get_vt_fmt() callback only
returns formats supported by the renderer. This is not necessarily
clear because vo_opengl checks this with converted_imgfmt separately.
None of this matters in practice though, because we know all formats
are always supported.

(This still requires somehow triggering decoder reinit to make the
change effective.)
2016-04-07 19:54:58 +02:00