1
0
mirror of https://github.com/mpv-player/mpv synced 2025-01-01 20:32:13 +00:00
Commit Graph

23 Commits

Author SHA1 Message Date
wm4
7e6456f43a rpi: add --hwdec=rpi-copy
This means it can be used with normal video filters.

Might help out with #3604.
2016-09-30 13:05:30 +02:00
Philip Langdale
3f7e43c2e2 hwdec_cuda: Add trivial cuda-copy wrapper
The cuvid decoder already knows how to copy back to system memory
if NV12 frames are requested, and this will happen if the decoder
is used without the hwdec.

For convenience, let's add a wrapper hwdec so people don't have
to explicitly pick the cuvid decoder if they want this behaviour.
2016-09-11 10:46:22 +02:00
Philip Langdale
2048ad2b8a hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.

As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.

Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).

ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.

These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.

In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:

mpv --vd=lavc:h264_cuvid foobar.mp4

Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).

To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.

That is what this change implements.

The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.

The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.

The frames are always in NV12 format, at least until 10bit output
supports emerges.

The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.

TODO Items:
* I need to add a download_image function for screenshots. This
  would do the same copy to system memory that the decoder's
  system memory output does.
* There are items to investigate on the ffmpeg side. There appears
  to be a problem with timestamps for some content.

Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.

This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.

Usage:

You will need to specify vo=opengl and hwdec=cuda.

Note that hwdec=auto will probably not work as it will try to use
vdpau first.

mpv --hwdec=cuda --vo=opengl foobar.mp4

If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-08 16:06:12 +02:00
Aman Gupta
588b2f48e5 videotoolbox: add --hwdec=videotoolbox-copy for h/w accelerated decoding with video filters 2016-07-15 01:01:17 +02:00
wm4
70b3561270 video: add --hwdec=auto-copy mode
This uses the normal autoprobing rules like "auto", but rejects anything
that isn't flagged as copying data back to system memory.

The chunk in command.c was dead code, so remove it instead of updating
it.
2016-05-11 16:20:13 +02:00
wm4
46fff8d31a video: refactor how VO exports hwdec device handles
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.

The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)

This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.

Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.

The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.

This is a big refactor. Some things may be untested and could be broken.
2016-05-09 20:03:22 +02:00
wm4
833375f88d command: change some hwdec properties
Introduce hwdec-current and hwdec-interop properties.

Deprecate hwdec-detected, which never made a lot of sense, and which is
replaced by the new properties. hwdec-active also becomes useless, as
hwdec-current is a superset, so it's deprecated too (for now).
2016-05-04 16:55:26 +02:00
wm4
3706918311 vo_opengl: D3D11VA + ANGLE interop
This uses ID3D11VideoProcessor to convert the video to a RGBA surface,
which is then bound to ANGLE. Currently ANGLE does not provide any way
to bind nv12 surfaces directly, so this will have to do.

ID3D11VideoContext1 would give us slightly more control about the
colorspace conversion, though it's still not good, and not available
in MinGW headers yet.

The video processor is created lazily, because we need to have the coded
frame size, of which AVFrame and mp_image have no concept of. Doing the
creation lazily is less of a pain than somehow hacking the coded frame
size into mp_image.

I'm not really sure how ID3D11VideoProcessorInputView is supposed to
work. We recreate it on every frame, which is simple and hopefully
doesn't affect performance.
2016-04-27 13:49:47 +02:00
wm4
cf9b415173 hwdec: remove numbers from enum
They don't actually mean anything.

Just HWDEC_NONE should remain 0, because it's the default init value for
structs etc.
2016-04-27 13:37:55 +02:00
wm4
f033481551 videotoolbox: change how videotoolbox format is managed
The underlying intention of this code is to make changing
--videotoolbox-format at runtime work. For this reason, the format can't
just be statically setup, but must be read from the option at runtime.

This means the format is not fixed anymore, and we have to make sure the
renderer is property reinitialized if the format changes. There is
currently no way to trigger reinit on this level, which is why the
mp_image_params.hw_subfmt field was introduced.

One sketchy thing remains: normally, the renderer is supposed to be
involved with VO format negotiation, which would ensure that the VO
can take the format at all. Since the hw_subfmt is not part of this
format negotiation, it's implied the get_vt_fmt() callback only
returns formats supported by the renderer. This is not necessarily
clear because vo_opengl checks this with converted_imgfmt separately.
None of this matters in practice though, because we know all formats
are always supported.

(This still requires somehow triggering decoder reinit to make the
change effective.)
2016-04-07 19:54:58 +02:00
Kevin Mitchell
a7110862c8 vd_lavc: add d3d11va hwdec
This commit adds the d3d11va-copy hwdec mode using the ffmpeg d3d11va
api. Functions in common with dxva2 are handled in a separate decode/d3d.c
file. A future commit will rewrite decode/dxva2.c to share this code.
2016-03-30 09:01:27 -07:00
Jan Ekström
5843392db5 Add a mediacodec decoder hwdec wrapper
Does the same thing as the rpi one - makes fallback possible by
pretending that h264_mediacodec is a hwdec.
2016-03-25 21:35:59 +01:00
Kevin Mitchell
084162d6fe dxva2: add interop (non-copyback) hwdec_type
This always falls back to software decoding right now. VO support will be added
in future commits.
2016-02-17 06:59:02 -08:00
wm4
1dd7b7bddc video: remove VDA support
VideoToolbox is preferred. Now that FFmpeg released 2.8, there's no
reason to support VDA anymore. In fact, we had a bug that made VDA not
useable with older FFmpeg versions in some newer mpv releases.

VideoToolbox is supported even on slightly older OSX versions, and if
not, you still can run mpv without hw decoding.
2015-09-28 22:03:14 +02:00
Sebastien Zwickert
31b5a211f4 hwdec: add VideoToolbox support
VDA is being deprecated in OS X 10.11 so this is needed to keep hwdec working.
The code needs libavcodec support which was added recently (to FFmpeg git,
libav doesn't support it).

Signed-off-by: Stefano Pigozzi <stefano.pigozzi@gmail.com>
2015-08-05 17:47:30 +02:00
wm4
1d29177c5c options: cleanup hwdec name mappings
Now there's a "canonical" table for mapping the names, that other code
can use, without having to rely too much on option code magic.

Also, use the central HWDEC constants, instead of magic values. (There
used to be semi-ok reasons to do this, but now it makes no sense
anymore.)
2015-07-07 15:05:32 +02:00
wm4
b85321d057 vo_direct3d, dxva2: use the same D3D device
Since we still read-back (and don't have hard plans on changing this),
this doesn't have much of an advantage.
2015-07-03 16:04:42 +02:00
wm4
8fff125422 RPI support
This requires FFmpeg git master for accelerated hardware decoding.
Keep in mind that FFmpeg must be compiled with --enable-mmal. Libav
will also work.

Most things work. Screenshots don't work with accelerated/opaque
decoding (except using full window screenshot mode). Subtitles are
very slow - even simple but huge overlays can cause frame drops.

This always uses fullscreen mode. It uses dispmanx and mmal directly,
and there are no window managers or anything on this level.

vo_opengl also kind of works, but is pretty useless and slow. It can't
use opaque hardware decoding (copy back can be used by forcing the
option --vd=lavc:h264_mmal). Keep in mind that the dispmanx backend
is preferred over the X11 ones in case you're trying on X11; but X11
is even more useless on RPI.

This doesn't correctly reject extended h264 profiles and thus doesn't
fallback to software decoding. The hw supports only up to the high
profile, and will e.g. return garbage for Hi10P video.

This sets a precedent of enabling hw decoding by default, but only
if RPI support is compiled (which most hopefully it will be disabled
on desktop Linux platforms). While it's more or less required to use
hw decoding on the weak RPI, it causes more problems than it solves
on real platforms (Linux has the Intel GPU problem, OSX still has
some cases with broken decoding.) So I can live with this compromise
of having different defaults depending on the platform.

Raspberry Pi 2 is required. This wasn't tested on the original RPI,
though at least decoding itself seems to work (but full playback was
not tested).
2015-03-29 16:09:56 +02:00
wm4
2a9534871d command: add property returning detected hwdec API
This is somewhat imperfect, because detection of hw decoding APIs is
mostly done on demand, and often avoided if not necessary. (For example,
we know very well that there are no hw decoders for certain codecs.)

This also requires every hwdec backend to identify itself (see hwdec.h
changes).
2015-02-02 22:43:13 +01:00
wm4
74581a6106 video: handle hwdec screenshots differently
Instead of converting the hw surface to an image in the VO, provide a
generic way to convet hw surfaces, and use this in the screenshot code.

It's all relatively straightforward, except vdpau is being terrible. It
needs a huge chunk of new code, because copying back is not simple.
2015-01-22 18:18:23 +01:00
wm4
aae9af348e video: have a generic context struct for hwdec backends
Before this commit, each hw backend had their own specific struct types
for context, and some, like VDA, had none at all. Add a context struct
(mp_hwdec_ctx) that provides a somewhat generic way to pass the hwdec
context around. Some things get slightly better, some slightly more
verbose.

mp_hwdec_info is still around; it's still needed, but is reduced to its
role of handling delayed loading of the hwdec backend.
2015-01-22 15:32:23 +01:00
wm4
df58e82237 video: move display and timing to a separate thread
The VO is run inside its own thread. It also does most of video timing.
The playloop hands the image data and a realtime timestamp to the VO,
and the VO does the rest.

In particular, this allows the playloop to do other things, instead of
blocking for video redraw. But if anything accesses the VO during video
timing, it will block.

This also fixes vo_sdl.c event handling; but that is only a side-effect,
since reimplementing the broken way would require more effort.

Also drop --softsleep. In theory, this option helps if the kernel's
sleeping mechanism is too inaccurate for video timing. In practice, I
haven't ever encountered a situation where it helps, and it just burns
CPU cycles. On the other hand it's probably actively harmful, because
it prevents the libavcodec decoder threads from doing real work.

Side note:

Originally, I intended that multiple frames can be queued to the VO. But
this is not done, due to problems with OSD and other certain features.
OSD in particular is simply designed in a way that it can be neither
timed nor copied, so you do have to render it into the video frame
before you can draw the next frame. (Subtitles have no such restriction.
sd_lavc was even updated to fix this.) It seems the right solution to
queuing multiple VO frames is rendering on VO-backed framebuffers, like
vo_vdpau.c does. This requires VO driver support, and is out of scope
of this commit.

As consequence, the VO has a queue size of 1. The existing video queue
is just needed to compute frame duration, and will be moved out in the
next commit.
2014-08-12 23:24:08 +02:00
wm4
4fa2babacc video: move struct mp_hwdec_info into its own header file
This means most code accessing this struct must now include hwdec.h
instead of dec_video.h. I just put it into dec_video.h at first because
I thought a separate file would be a waste, but it's more proper to do
it this way, as there are too many files which include dec_video.h only
to get the mp_hwdec_info definition.
2013-11-23 21:26:31 +01:00