Commit Graph

31 Commits

Author SHA1 Message Date
wm4 00d74a509a video: fix a typo in a comment 2017-03-23 11:16:02 +01:00
wm4 6aa4efd1e3 vd_lavc, vaapi: move hw device creation to generic code
hw_vaapi.c didn't do much interesting anymore. Other than the function
to create a device for decoding with vaapi-copy, everything can be done
by generic code. Other libavcodec hwaccels are planned to provide the
same API as vaapi. It will be possible to drop the other hw_ files in
the future. They will use this generic code instead.
2017-02-20 08:39:55 +01:00
wm4 2b5577901d videotoolbox: remove weird format-negotiation between VO and decoder
Originally, there was probably some sort of intention to restrict it to
formats supported by the interop, or something. But in the end it was
overcomplicated nonsense.

In the future, we could use mp_hwdec_ctx.supported_formats or other
mechanisms to handle this in a better way.

mp_hwdec_ctx.ctx is not set to a dummy pointer - hwdec_devices_load() is
only used to detect whether to vo_opengl interop is present, and the
common hwdec code expects that the .ctx field is not NULL.

This also changes videotoolbox-copy to use --videotoolbox-format,
instead of the FFmpeg-set default.
2017-02-17 14:14:22 +01:00
wm4 bbdecb792a hwdec: add a AVBufferRef(AVHWDeviceContext) field
This makes "generic" interaction with libav* components easier.
2017-01-16 16:10:22 +01:00
wm4 812128bab7 vo_opengl, vaapi: properly probe 10 bit rendering support
There are going to be users who have a Mesa installation which do not
support 10 bit, but a GPU which can decode to 10 bit. So it's probably
better not to hardcode whether it is supported.

Introduce a more general way to signal supported formats from renderer
to decoder. Obviously this is imperfect, because it still isn't part of
proper format negotation (for example, what if there's a vavpp filter,
which accepts anything). Still slightly better than before.

I don't know any way to probe for vaapi dmabuf/EGL dmabuf support
properly (in particular testing specific formats, not just general
availability). So we stay with the current approach and try to create
and map dummy surfaces on init to probe for support. Overdo it and check
all formats that AVHWFramesConstraints reports, instead of only NV12 and
P010 surfaces.

Since we can support unknown formats now, add explicitly checks to the
EGL/dmabuf mapper code to reject unsupported formats. I also noticed
that libavutil signals support for RGB0/BGR0, but couldn't get it to
work. Remove the DRM formats that are unused/didn't work the way I tried
to use them.

With this, 10 bit decoding + rendering should work, provided you have
a capable CPU and a patched Mesa. The required Mesa patch adds support
for the R16 and GR32 formats. It was sent by a Kodi developer to the
Mesa developer mailing list and was not accepted yet.
2017-01-13 18:43:35 +01:00
Philip Langdale f5e82d5ed3 vo_opengl: hwdec_cuda: Use dynamic loading for cuda functions
This change applies the pattern used in ffmpeg to dynamically load
cuda, to avoid requiring the CUDA SDK at build time.
2016-11-23 01:07:26 +01:00
wm4 6b18d4dba5 video: add --hwdec=vdpau-copy mode
At this point, all other hwaccels provide -copy modes, and vdpau is the
exception with not having one. Although there is vf_vdpaurb, it's less
convenient in certain situations, and exposes some issues with the
filter chain code as well.
2016-10-20 16:43:02 +02:00
Philip Langdale 0a81fe1cf9 vd_lavc: Add hwdec wrapper for crystalhd
This hardware decodes to system memory so it only requires a wrapper.
2016-10-15 17:44:23 +02:00
wm4 7e6456f43a rpi: add --hwdec=rpi-copy
This means it can be used with normal video filters.

Might help out with #3604.
2016-09-30 13:05:30 +02:00
Philip Langdale 3f7e43c2e2 hwdec_cuda: Add trivial cuda-copy wrapper
The cuvid decoder already knows how to copy back to system memory
if NV12 frames are requested, and this will happen if the decoder
is used without the hwdec.

For convenience, let's add a wrapper hwdec so people don't have
to explicitly pick the cuvid decoder if they want this behaviour.
2016-09-11 10:46:22 +02:00
Philip Langdale 2048ad2b8a hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.

As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.

Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).

ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.

These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.

In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:

mpv --vd=lavc:h264_cuvid foobar.mp4

Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).

To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.

That is what this change implements.

The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.

The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.

The frames are always in NV12 format, at least until 10bit output
supports emerges.

The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.

TODO Items:
* I need to add a download_image function for screenshots. This
  would do the same copy to system memory that the decoder's
  system memory output does.
* There are items to investigate on the ffmpeg side. There appears
  to be a problem with timestamps for some content.

Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.

This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.

Usage:

You will need to specify vo=opengl and hwdec=cuda.

Note that hwdec=auto will probably not work as it will try to use
vdpau first.

mpv --hwdec=cuda --vo=opengl foobar.mp4

If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-08 16:06:12 +02:00
Aman Gupta 588b2f48e5 videotoolbox: add --hwdec=videotoolbox-copy for h/w accelerated decoding with video filters 2016-07-15 01:01:17 +02:00
wm4 70b3561270 video: add --hwdec=auto-copy mode
This uses the normal autoprobing rules like "auto", but rejects anything
that isn't flagged as copying data back to system memory.

The chunk in command.c was dead code, so remove it instead of updating
it.
2016-05-11 16:20:13 +02:00
wm4 46fff8d31a video: refactor how VO exports hwdec device handles
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.

The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)

This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.

Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.

The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.

This is a big refactor. Some things may be untested and could be broken.
2016-05-09 20:03:22 +02:00
wm4 833375f88d command: change some hwdec properties
Introduce hwdec-current and hwdec-interop properties.

Deprecate hwdec-detected, which never made a lot of sense, and which is
replaced by the new properties. hwdec-active also becomes useless, as
hwdec-current is a superset, so it's deprecated too (for now).
2016-05-04 16:55:26 +02:00
wm4 3706918311 vo_opengl: D3D11VA + ANGLE interop
This uses ID3D11VideoProcessor to convert the video to a RGBA surface,
which is then bound to ANGLE. Currently ANGLE does not provide any way
to bind nv12 surfaces directly, so this will have to do.

ID3D11VideoContext1 would give us slightly more control about the
colorspace conversion, though it's still not good, and not available
in MinGW headers yet.

The video processor is created lazily, because we need to have the coded
frame size, of which AVFrame and mp_image have no concept of. Doing the
creation lazily is less of a pain than somehow hacking the coded frame
size into mp_image.

I'm not really sure how ID3D11VideoProcessorInputView is supposed to
work. We recreate it on every frame, which is simple and hopefully
doesn't affect performance.
2016-04-27 13:49:47 +02:00
wm4 cf9b415173 hwdec: remove numbers from enum
They don't actually mean anything.

Just HWDEC_NONE should remain 0, because it's the default init value for
structs etc.
2016-04-27 13:37:55 +02:00
wm4 f033481551 videotoolbox: change how videotoolbox format is managed
The underlying intention of this code is to make changing
--videotoolbox-format at runtime work. For this reason, the format can't
just be statically setup, but must be read from the option at runtime.

This means the format is not fixed anymore, and we have to make sure the
renderer is property reinitialized if the format changes. There is
currently no way to trigger reinit on this level, which is why the
mp_image_params.hw_subfmt field was introduced.

One sketchy thing remains: normally, the renderer is supposed to be
involved with VO format negotiation, which would ensure that the VO
can take the format at all. Since the hw_subfmt is not part of this
format negotiation, it's implied the get_vt_fmt() callback only
returns formats supported by the renderer. This is not necessarily
clear because vo_opengl checks this with converted_imgfmt separately.
None of this matters in practice though, because we know all formats
are always supported.

(This still requires somehow triggering decoder reinit to make the
change effective.)
2016-04-07 19:54:58 +02:00
Kevin Mitchell a7110862c8 vd_lavc: add d3d11va hwdec
This commit adds the d3d11va-copy hwdec mode using the ffmpeg d3d11va
api. Functions in common with dxva2 are handled in a separate decode/d3d.c
file. A future commit will rewrite decode/dxva2.c to share this code.
2016-03-30 09:01:27 -07:00
Jan Ekström 5843392db5 Add a mediacodec decoder hwdec wrapper
Does the same thing as the rpi one - makes fallback possible by
pretending that h264_mediacodec is a hwdec.
2016-03-25 21:35:59 +01:00
Kevin Mitchell 084162d6fe dxva2: add interop (non-copyback) hwdec_type
This always falls back to software decoding right now. VO support will be added
in future commits.
2016-02-17 06:59:02 -08:00
wm4 1dd7b7bddc video: remove VDA support
VideoToolbox is preferred. Now that FFmpeg released 2.8, there's no
reason to support VDA anymore. In fact, we had a bug that made VDA not
useable with older FFmpeg versions in some newer mpv releases.

VideoToolbox is supported even on slightly older OSX versions, and if
not, you still can run mpv without hw decoding.
2015-09-28 22:03:14 +02:00
Sebastien Zwickert 31b5a211f4 hwdec: add VideoToolbox support
VDA is being deprecated in OS X 10.11 so this is needed to keep hwdec working.
The code needs libavcodec support which was added recently (to FFmpeg git,
libav doesn't support it).

Signed-off-by: Stefano Pigozzi <stefano.pigozzi@gmail.com>
2015-08-05 17:47:30 +02:00
wm4 1d29177c5c options: cleanup hwdec name mappings
Now there's a "canonical" table for mapping the names, that other code
can use, without having to rely too much on option code magic.

Also, use the central HWDEC constants, instead of magic values. (There
used to be semi-ok reasons to do this, but now it makes no sense
anymore.)
2015-07-07 15:05:32 +02:00
wm4 b85321d057 vo_direct3d, dxva2: use the same D3D device
Since we still read-back (and don't have hard plans on changing this),
this doesn't have much of an advantage.
2015-07-03 16:04:42 +02:00
wm4 8fff125422 RPI support
This requires FFmpeg git master for accelerated hardware decoding.
Keep in mind that FFmpeg must be compiled with --enable-mmal. Libav
will also work.

Most things work. Screenshots don't work with accelerated/opaque
decoding (except using full window screenshot mode). Subtitles are
very slow - even simple but huge overlays can cause frame drops.

This always uses fullscreen mode. It uses dispmanx and mmal directly,
and there are no window managers or anything on this level.

vo_opengl also kind of works, but is pretty useless and slow. It can't
use opaque hardware decoding (copy back can be used by forcing the
option --vd=lavc:h264_mmal). Keep in mind that the dispmanx backend
is preferred over the X11 ones in case you're trying on X11; but X11
is even more useless on RPI.

This doesn't correctly reject extended h264 profiles and thus doesn't
fallback to software decoding. The hw supports only up to the high
profile, and will e.g. return garbage for Hi10P video.

This sets a precedent of enabling hw decoding by default, but only
if RPI support is compiled (which most hopefully it will be disabled
on desktop Linux platforms). While it's more or less required to use
hw decoding on the weak RPI, it causes more problems than it solves
on real platforms (Linux has the Intel GPU problem, OSX still has
some cases with broken decoding.) So I can live with this compromise
of having different defaults depending on the platform.

Raspberry Pi 2 is required. This wasn't tested on the original RPI,
though at least decoding itself seems to work (but full playback was
not tested).
2015-03-29 16:09:56 +02:00
wm4 2a9534871d command: add property returning detected hwdec API
This is somewhat imperfect, because detection of hw decoding APIs is
mostly done on demand, and often avoided if not necessary. (For example,
we know very well that there are no hw decoders for certain codecs.)

This also requires every hwdec backend to identify itself (see hwdec.h
changes).
2015-02-02 22:43:13 +01:00
wm4 74581a6106 video: handle hwdec screenshots differently
Instead of converting the hw surface to an image in the VO, provide a
generic way to convet hw surfaces, and use this in the screenshot code.

It's all relatively straightforward, except vdpau is being terrible. It
needs a huge chunk of new code, because copying back is not simple.
2015-01-22 18:18:23 +01:00
wm4 aae9af348e video: have a generic context struct for hwdec backends
Before this commit, each hw backend had their own specific struct types
for context, and some, like VDA, had none at all. Add a context struct
(mp_hwdec_ctx) that provides a somewhat generic way to pass the hwdec
context around. Some things get slightly better, some slightly more
verbose.

mp_hwdec_info is still around; it's still needed, but is reduced to its
role of handling delayed loading of the hwdec backend.
2015-01-22 15:32:23 +01:00
wm4 df58e82237 video: move display and timing to a separate thread
The VO is run inside its own thread. It also does most of video timing.
The playloop hands the image data and a realtime timestamp to the VO,
and the VO does the rest.

In particular, this allows the playloop to do other things, instead of
blocking for video redraw. But if anything accesses the VO during video
timing, it will block.

This also fixes vo_sdl.c event handling; but that is only a side-effect,
since reimplementing the broken way would require more effort.

Also drop --softsleep. In theory, this option helps if the kernel's
sleeping mechanism is too inaccurate for video timing. In practice, I
haven't ever encountered a situation where it helps, and it just burns
CPU cycles. On the other hand it's probably actively harmful, because
it prevents the libavcodec decoder threads from doing real work.

Side note:

Originally, I intended that multiple frames can be queued to the VO. But
this is not done, due to problems with OSD and other certain features.
OSD in particular is simply designed in a way that it can be neither
timed nor copied, so you do have to render it into the video frame
before you can draw the next frame. (Subtitles have no such restriction.
sd_lavc was even updated to fix this.) It seems the right solution to
queuing multiple VO frames is rendering on VO-backed framebuffers, like
vo_vdpau.c does. This requires VO driver support, and is out of scope
of this commit.

As consequence, the VO has a queue size of 1. The existing video queue
is just needed to compute frame duration, and will be moved out in the
next commit.
2014-08-12 23:24:08 +02:00
wm4 4fa2babacc video: move struct mp_hwdec_info into its own header file
This means most code accessing this struct must now include hwdec.h
instead of dec_video.h. I just put it into dec_video.h at first because
I thought a separate file would be a waste, but it's more proper to do
it this way, as there are too many files which include dec_video.h only
to get the mp_hwdec_info definition.
2013-11-23 21:26:31 +01:00