hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2016 Philip Langdale <philipl@overt.org>
|
|
|
|
*
|
|
|
|
* This file is part of mpv.
|
|
|
|
*
|
|
|
|
* mpv is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
|
|
* License as published by the Free Software Foundation; either
|
|
|
|
* version 2.1 of the License, or (at your option) any later version.
|
|
|
|
*
|
|
|
|
* mpv is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU Lesser General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
|
|
* License along with mpv. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This hwdec implements an optimized output path using CUDA->OpenGL
|
|
|
|
* interop for frame data that is stored in CUDA device memory.
|
|
|
|
* Although it is not explicit in the code here, the only practical way
|
|
|
|
* to get data in this form is from the 'cuvid' decoder (aka NvDecode).
|
|
|
|
*
|
|
|
|
* For now, cuvid/NvDecode will always return images in NV12 format, even
|
|
|
|
* when decoding 10bit streams (there is some hardware dithering going on).
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <libavutil/hwcontext.h>
|
|
|
|
#include <libavutil/hwcontext_cuda.h>
|
|
|
|
|
2016-09-10 20:50:38 +00:00
|
|
|
#include "video/mp_image_pool.h"
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
#include "hwdec.h"
|
|
|
|
#include "video.h"
|
|
|
|
|
|
|
|
#include <cudaGL.h>
|
|
|
|
|
|
|
|
struct priv {
|
|
|
|
struct mp_hwdec_ctx hwctx;
|
|
|
|
struct mp_image layout;
|
|
|
|
GLuint gl_textures[2];
|
|
|
|
GLuint gl_pbos[2];
|
2016-09-10 03:07:43 +00:00
|
|
|
CUgraphicsResource cu_res[2];
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
bool mapped;
|
|
|
|
|
|
|
|
CUcontext cuda_ctx;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int check_cu(struct gl_hwdec *hw, CUresult err, const char *func)
|
|
|
|
{
|
|
|
|
const char *err_name;
|
|
|
|
const char *err_string;
|
|
|
|
|
|
|
|
MP_TRACE(hw, "Calling %s\n", func);
|
|
|
|
|
|
|
|
if (err == CUDA_SUCCESS)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cuGetErrorName(err, &err_name);
|
|
|
|
cuGetErrorString(err, &err_string);
|
|
|
|
|
|
|
|
MP_ERR(hw, "%s failed", func);
|
|
|
|
if (err_name && err_string)
|
|
|
|
MP_ERR(hw, " -> %s: %s", err_name, err_string);
|
|
|
|
MP_ERR(hw, "\n");
|
|
|
|
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
#define CHECK_CU(x) check_cu(hw, (x), #x)
|
|
|
|
|
2016-09-10 20:50:38 +00:00
|
|
|
static struct mp_image *cuda_download_image(struct mp_hwdec_ctx *ctx,
|
|
|
|
struct mp_image *hw_image,
|
|
|
|
struct mp_image_pool *swpool)
|
|
|
|
{
|
|
|
|
CUcontext cuda_ctx = ctx->ctx;
|
|
|
|
CUcontext dummy;
|
|
|
|
CUresult err, eerr;
|
|
|
|
|
|
|
|
if (hw_image->imgfmt != IMGFMT_CUDA)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
struct mp_image *out = mp_image_pool_get(swpool, IMGFMT_NV12,
|
|
|
|
hw_image->w, hw_image->h);
|
|
|
|
if (!out)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
err = cuCtxPushCurrent(cuda_ctx);
|
|
|
|
if (err != CUDA_SUCCESS)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
mp_image_set_size(out, hw_image->w, hw_image->h);
|
|
|
|
mp_image_copy_attributes(out, hw_image);
|
|
|
|
|
|
|
|
for (int n = 0; n < 2; n++) {
|
|
|
|
CUDA_MEMCPY2D cpy = {
|
|
|
|
.srcMemoryType = CU_MEMORYTYPE_DEVICE,
|
|
|
|
.dstMemoryType = CU_MEMORYTYPE_HOST,
|
|
|
|
.srcDevice = (CUdeviceptr)hw_image->planes[n],
|
|
|
|
.dstHost = out->planes[n],
|
|
|
|
.srcPitch = hw_image->stride[n],
|
|
|
|
.dstPitch = out->stride[n],
|
|
|
|
.WidthInBytes = mp_image_plane_w(out, n) * (n + 1),
|
|
|
|
.Height = mp_image_plane_h(out, n),
|
|
|
|
};
|
|
|
|
|
|
|
|
err = cuMemcpy2D(&cpy);
|
|
|
|
if (err != CUDA_SUCCESS) {
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
error:
|
|
|
|
eerr = cuCtxPopCurrent(&dummy);
|
|
|
|
if (eerr != CUDA_SUCCESS || err != CUDA_SUCCESS) {
|
|
|
|
talloc_free(out);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return out;
|
|
|
|
}
|
|
|
|
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
static int cuda_create(struct gl_hwdec *hw)
|
|
|
|
{
|
|
|
|
CUdevice device;
|
|
|
|
CUcontext cuda_ctx = NULL;
|
|
|
|
CUcontext dummy;
|
|
|
|
int ret = 0, eret = 0;
|
|
|
|
|
|
|
|
// PBO Requirements
|
|
|
|
if (hw->gl->version < 210 && hw->gl->es < 300) {
|
|
|
|
MP_ERR(hw, "need OpenGL >= 2.1 or OpenGL-ES >= 3.0\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct priv *p = talloc_zero(hw, struct priv);
|
|
|
|
hw->priv = p;
|
|
|
|
|
|
|
|
ret = CHECK_CU(cuInit(0));
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
///TODO: Make device index configurable
|
|
|
|
ret = CHECK_CU(cuDeviceGet(&device, 0));
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
ret = CHECK_CU(cuCtxCreate(&cuda_ctx, CU_CTX_SCHED_BLOCKING_SYNC, device));
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
p->cuda_ctx = cuda_ctx;
|
|
|
|
|
|
|
|
p->hwctx = (struct mp_hwdec_ctx) {
|
|
|
|
.type = HWDEC_CUDA,
|
|
|
|
.ctx = cuda_ctx,
|
2016-09-10 20:50:38 +00:00
|
|
|
.download_image = cuda_download_image,
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
};
|
|
|
|
p->hwctx.driver_name = hw->driver->name;
|
|
|
|
hwdec_devices_add(hw->devs, &p->hwctx);
|
|
|
|
|
|
|
|
error:
|
|
|
|
eret = CHECK_CU(cuCtxPopCurrent(&dummy));
|
|
|
|
if (eret < 0)
|
|
|
|
return eret;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int reinit(struct gl_hwdec *hw, struct mp_image_params *params)
|
|
|
|
{
|
|
|
|
struct priv *p = hw->priv;
|
|
|
|
GL *gl = hw->gl;
|
|
|
|
CUcontext dummy;
|
|
|
|
int ret = 0, eret = 0;
|
|
|
|
|
|
|
|
assert(params->imgfmt == hw->driver->imgfmt);
|
|
|
|
params->imgfmt = IMGFMT_NV12;
|
|
|
|
params->hw_subfmt = 0;
|
|
|
|
|
|
|
|
mp_image_set_params(&p->layout, params);
|
|
|
|
|
|
|
|
ret = CHECK_CU(cuCtxPushCurrent(p->cuda_ctx));
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
gl->GenTextures(2, p->gl_textures);
|
|
|
|
for (int n = 0; n < 2; n++) {
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, p->gl_textures[n]);
|
|
|
|
GLenum filter = GL_NEAREST;
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, filter);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, filter);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexImage2D(GL_TEXTURE_2D, 0, n == 0 ? GL_R8 : GL_RG8,
|
|
|
|
mp_image_plane_w(&p->layout, n),
|
|
|
|
mp_image_plane_h(&p->layout, n),
|
|
|
|
0, n == 0 ? GL_RED : GL_RG, GL_UNSIGNED_BYTE, NULL);
|
|
|
|
}
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, 0);
|
|
|
|
|
|
|
|
gl->GenBuffers(2, p->gl_pbos);
|
|
|
|
for (int n = 0; n < 2; n++) {
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, p->gl_pbos[n]);
|
|
|
|
// Chroma plane is two bytes per pixel
|
|
|
|
gl->BufferData(GL_PIXEL_UNPACK_BUFFER,
|
|
|
|
mp_image_plane_w(&p->layout, n) *
|
|
|
|
mp_image_plane_h(&p->layout, n) * (n + 1),
|
|
|
|
NULL, GL_STREAM_DRAW);
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
2016-09-10 03:07:43 +00:00
|
|
|
ret = CHECK_CU(cuGraphicsGLRegisterBuffer(&p->cu_res[n],
|
|
|
|
p->gl_pbos[n],
|
|
|
|
CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD));
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
2016-09-10 03:07:43 +00:00
|
|
|
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
error:
|
|
|
|
eret = CHECK_CU(cuCtxPopCurrent(&dummy));
|
|
|
|
if (eret < 0)
|
|
|
|
return eret;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy(struct gl_hwdec *hw)
|
|
|
|
{
|
|
|
|
struct priv *p = hw->priv;
|
|
|
|
GL *gl = hw->gl;
|
|
|
|
CUcontext dummy;
|
|
|
|
|
|
|
|
// Don't bail if any CUDA calls fail. This is all best effort.
|
|
|
|
CHECK_CU(cuCtxPushCurrent(p->cuda_ctx));
|
|
|
|
for (int n = 0; n < 2; n++) {
|
2016-09-10 03:07:43 +00:00
|
|
|
if (p->cu_res[n] > 0)
|
|
|
|
CHECK_CU(cuGraphicsUnregisterResource(p->cu_res[n]));
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
}
|
|
|
|
CHECK_CU(cuCtxPopCurrent(&dummy));
|
|
|
|
|
|
|
|
gl->DeleteBuffers(2, p->gl_pbos);
|
|
|
|
gl->DeleteTextures(2, p->gl_textures);
|
|
|
|
|
|
|
|
hwdec_devices_remove(hw->devs, &p->hwctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int get_alignment(int stride)
|
|
|
|
{
|
|
|
|
if (stride % 8 == 0)
|
|
|
|
return 8;
|
|
|
|
if (stride % 4 == 0)
|
|
|
|
return 4;
|
|
|
|
if (stride % 2 == 0)
|
|
|
|
return 2;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int map_frame(struct gl_hwdec *hw, struct mp_image *hw_image,
|
|
|
|
struct gl_hwdec_frame *out_frame)
|
|
|
|
{
|
|
|
|
struct priv *p = hw->priv;
|
|
|
|
GL *gl = hw->gl;
|
|
|
|
CUcontext dummy;
|
|
|
|
int ret = 0, eret = 0;
|
|
|
|
|
|
|
|
ret = CHECK_CU(cuCtxPushCurrent(p->cuda_ctx));
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
*out_frame = (struct gl_hwdec_frame) { 0, };
|
|
|
|
|
2016-09-10 03:07:43 +00:00
|
|
|
ret = CHECK_CU(cuGraphicsMapResources(2, p->cu_res, NULL));
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
for (int n = 0; n < 2; n++) {
|
2016-09-10 03:07:43 +00:00
|
|
|
CUdeviceptr cuda_data;
|
|
|
|
size_t cuda_size;
|
|
|
|
|
|
|
|
ret = CHECK_CU(cuGraphicsResourceGetMappedPointer(&cuda_data,
|
|
|
|
&cuda_size,
|
|
|
|
p->cu_res[n]));
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
// dstPitch and widthInBytes must account for the chroma plane
|
|
|
|
// elements being two bytes wide.
|
|
|
|
CUDA_MEMCPY2D cpy = {
|
|
|
|
.srcMemoryType = CU_MEMORYTYPE_DEVICE,
|
|
|
|
.dstMemoryType = CU_MEMORYTYPE_DEVICE,
|
|
|
|
.srcDevice = (CUdeviceptr)hw_image->planes[n],
|
|
|
|
.dstDevice = cuda_data,
|
|
|
|
.srcPitch = hw_image->stride[n],
|
|
|
|
.dstPitch = mp_image_plane_w(&p->layout, n) * (n + 1),
|
|
|
|
.srcY = 0,
|
|
|
|
.WidthInBytes = mp_image_plane_w(&p->layout, n) * (n + 1),
|
|
|
|
.Height = mp_image_plane_h(&p->layout, n),
|
|
|
|
};
|
|
|
|
ret = CHECK_CU(cuMemcpy2D(&cpy));
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
|
2016-09-10 03:07:43 +00:00
|
|
|
}
|
|
|
|
ret = CHECK_CU(cuGraphicsUnmapResources(2, p->cu_res, NULL));
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
for (int n = 0; n < 2; n++) {
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, p->gl_pbos[n]);
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
gl->BindTexture(GL_TEXTURE_2D, p->gl_textures[n]);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT,
|
|
|
|
get_alignment(mp_image_plane_w(&p->layout, n)));
|
2016-09-10 03:07:43 +00:00
|
|
|
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
gl->TexSubImage2D(GL_TEXTURE_2D, 0,
|
|
|
|
0, 0,
|
|
|
|
mp_image_plane_w(&p->layout, n),
|
|
|
|
mp_image_plane_h(&p->layout, n),
|
|
|
|
n == 0 ? GL_RED : GL_RG, GL_UNSIGNED_BYTE, NULL);
|
|
|
|
|
2016-09-10 03:07:43 +00:00
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 4);
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, 0);
|
|
|
|
|
|
|
|
out_frame->planes[n] = (struct gl_hwdec_plane){
|
|
|
|
.gl_texture = p->gl_textures[n],
|
|
|
|
.gl_target = GL_TEXTURE_2D,
|
|
|
|
.tex_w = mp_image_plane_w(&p->layout, n),
|
|
|
|
.tex_h = mp_image_plane_h(&p->layout, n),
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
2016-09-10 03:07:43 +00:00
|
|
|
|
hwdec/opengl: Add support for CUDA and cuvid/NvDecode
Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross
platform, but nvidia proprietary API that exposes their hardware
video decoding capabilities. It is analogous to their DXVA or VDPAU
support on Windows or Linux but without using platform specific API
calls.
As a rule, you'd rather use DXVA or VDPAU as these are more mature
and well supported APIs, but on Linux, VDPAU is falling behind the
hardware capabilities, and there's no sign that nvidia are making
the investments to update it.
Most concretely, this means that there is no VP8/9 or HEVC Main10
support in VDPAU. On the other hand, NvDecode does export vp8/9 and
partial support for HEVC Main10 (more on that below).
ffmpeg already has support in the form of the "cuvid" family of
decoders. Due to the design of the API, it is best exposed as a full
decoder rather than an hwaccel. As such, there are decoders like
h264_cuvid, hevc_cuvid, etc.
These decoders support two output paths today - in both cases, NV12
frames are returned, either in CUDA device memory or regular system
memory.
In the case of the system memory path, the decoders can be used
as-is in mpv today with a command line like:
mpv --vd=lavc:h264_cuvid foobar.mp4
Doing this will take advantage of hardware decoding, but the cost
of the memcpy to system memory adds up, especially for high
resolution video (4K etc).
To avoid that, we need an hwdec that takes advantage of CUDA's
OpenGL interop to copy from device memory into OpenGL textures.
That is what this change implements.
The process is relatively simple as only basic device context
aquisition needs to be done by us - the CUDA buffer pool is managed
by the decoder - thankfully.
The hwdec looks a bit like the vdpau interop one - the hwdec
maintains a single set of plane textures and each output frame
is repeatedly mapped into these textures to pass on.
The frames are always in NV12 format, at least until 10bit output
supports emerges.
The only slightly interesting part of the copying process is that
CUDA works by associating PBOs, so we need to define these for
each of the textures.
TODO Items:
* I need to add a download_image function for screenshots. This
would do the same copy to system memory that the decoder's
system memory output does.
* There are items to investigate on the ffmpeg side. There appears
to be a problem with timestamps for some content.
Final note: I mentioned HEVC Main10. While there is no 10bit output
support, NvDecode can return dithered 8bit NV12 so you can take
advantage of the hardware acceleration.
This particular mode requires compiling ffmpeg with a modified
header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg
yet.
Usage:
You will need to specify vo=opengl and hwdec=cuda.
Note that hwdec=auto will probably not work as it will try to use
vdpau first.
mpv --hwdec=cuda --vo=opengl foobar.mp4
If you want to use filters that require frames in system memory,
just use the decoder directly without the hwdec, as documented
above.
2016-09-04 22:23:55 +00:00
|
|
|
error:
|
|
|
|
eret = CHECK_CU(cuCtxPopCurrent(&dummy));
|
|
|
|
if (eret < 0)
|
|
|
|
return eret;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
const struct gl_hwdec_driver gl_hwdec_cuda = {
|
|
|
|
.name = "cuda",
|
|
|
|
.api = HWDEC_CUDA,
|
|
|
|
.imgfmt = IMGFMT_CUDA,
|
|
|
|
.create = cuda_create,
|
|
|
|
.reinit = reinit,
|
|
|
|
.map_frame = map_frame,
|
|
|
|
.destroy = destroy,
|
|
|
|
};
|