mpv/video/out/opengl/hwdec_cuda.c

339 lines
9.2 KiB
C
Raw Normal View History

hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
/*
* Copyright (c) 2016 Philip Langdale <philipl@overt.org>
*
* This file is part of mpv.
*
* mpv is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* mpv is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with mpv. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* This hwdec implements an optimized output path using CUDA->OpenGL
* interop for frame data that is stored in CUDA device memory.
* Although it is not explicit in the code here, the only practical way
* to get data in this form is from the 'cuvid' decoder (aka NvDecode).
*
* For now, cuvid/NvDecode will always return images in NV12 format, even
* when decoding 10bit streams (there is some hardware dithering going on).
*/
#include "cuda_dynamic.h"
#include <libavutil/hwcontext.h>
#include <libavutil/hwcontext_cuda.h>
vo_opengl: refactor into vo_gpu This is done in several steps: 1. refactor MPGLContext -> struct ra_ctx 2. move GL-specific stuff in vo_opengl into opengl/context.c 3. generalize context creation to support other APIs, and add --gpu-api 4. rename all of the --opengl- options that are no longer opengl-specific 5. move all of the stuff from opengl/* that isn't GL-specific into gpu/ (note: opengl/gl_utils.h became opengl/utils.h) 6. rename vo_opengl to vo_gpu 7. to handle window screenshots, the short-term approach was to just add it to ra_swchain_fns. Long term (and for vulkan) this has to be moved to ra itself (and vo_gpu altered to compensate), but this was a stop-gap measure to prevent this commit from getting too big 8. move ra->fns->flush to ra_gl_ctx instead 9. some other minor changes that I've probably already forgotten Note: This is one half of a major refactor, the other half of which is provided by rossy's following commit. This commit enables support for all linux platforms, while his version enables support for all non-linux platforms. Note 2: vo_opengl_cb.c also re-uses ra_gl_ctx so it benefits from the --opengl- options like --opengl-early-flush, --opengl-finish etc. Should be a strict superset of the old functionality. Disclaimer: Since I have no way of compiling mpv on all platforms, some of these ports were done blindly. Specifically, the blind ports included context_mali_fbdev.c and context_rpi.c. Since they're both based on egl_helpers, the port should have gone smoothly without any major changes required. But if somebody complains about a compile error on those platforms (assuming anybody actually uses them), you know where to complain.
2017-09-14 06:04:55 +00:00
#include "video/out/gpu/hwdec.h"
#include "formats.h"
#include "options/m_config.h"
#include "ra_gl.h"
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
struct priv_owner {
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
struct mp_hwdec_ctx hwctx;
CUcontext display_ctx;
CUcontext decode_ctx;
};
struct priv {
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
struct mp_image layout;
CUgraphicsResource cu_res[4];
CUarray cu_array[4];
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
CUcontext display_ctx;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
};
static int check_cu(struct ra_hwdec *hw, CUresult err, const char *func)
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
{
const char *err_name;
const char *err_string;
MP_TRACE(hw, "Calling %s\n", func);
if (err == CUDA_SUCCESS)
return 0;
cuGetErrorName(err, &err_name);
cuGetErrorString(err, &err_string);
MP_ERR(hw, "%s failed", func);
if (err_name && err_string)
MP_ERR(hw, " -> %s: %s", err_name, err_string);
MP_ERR(hw, "\n");
return -1;
}
#define CHECK_CU(x) check_cu(hw, (x), #x)
static int cuda_init(struct ra_hwdec *hw)
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
{
CUdevice display_dev;
AVBufferRef *hw_device_ctx = NULL;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
CUcontext dummy;
unsigned int device_count;
int ret = 0;
struct priv_owner *p = hw->priv;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
if (!ra_is_gl(hw->ra))
return -1;
GL *gl = ra_gl_get(hw->ra);
if (gl->version < 210 && gl->es < 300) {
MP_VERBOSE(hw, "need OpenGL >= 2.1 or OpenGL-ES >= 3.0\n");
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
return -1;
}
bool loaded = cuda_load();
if (!loaded) {
MP_VERBOSE(hw, "Failed to load CUDA symbols\n");
return -1;
}
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
ret = CHECK_CU(cuInit(0));
if (ret < 0)
goto error;
// Allocate display context
ret = CHECK_CU(cuGLGetDevices(&device_count, &display_dev, 1,
CU_GL_DEVICE_LIST_ALL));
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
if (ret < 0)
goto error;
ret = CHECK_CU(cuCtxCreate(&p->display_ctx, CU_CTX_SCHED_BLOCKING_SYNC,
display_dev));
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
if (ret < 0)
goto error;
p->decode_ctx = p->display_ctx;
int decode_dev_idx = -1;
mp_read_option_raw(hw->global, "cuda-decode-device", &m_option_type_choice,
&decode_dev_idx);
if (decode_dev_idx > -1) {
CUdevice decode_dev;
ret = CHECK_CU(cuDeviceGet(&decode_dev, decode_dev_idx));
if (ret < 0)
goto error;
if (decode_dev != display_dev) {
MP_INFO(hw, "Using separate decoder and display devices\n");
// Pop the display context. We won't use it again during init()
ret = CHECK_CU(cuCtxPopCurrent(&dummy));
if (ret < 0)
goto error;
ret = CHECK_CU(cuCtxCreate(&p->decode_ctx, CU_CTX_SCHED_BLOCKING_SYNC,
decode_dev));
if (ret < 0)
goto error;
}
}
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
hw_device_ctx = av_hwdevice_ctx_alloc(AV_HWDEVICE_TYPE_CUDA);
if (!hw_device_ctx)
goto error;
AVHWDeviceContext *device_ctx = (void *)hw_device_ctx->data;
AVCUDADeviceContext *device_hwctx = device_ctx->hwctx;
device_hwctx->cuda_ctx = p->decode_ctx;
ret = av_hwdevice_ctx_init(hw_device_ctx);
if (ret < 0) {
MP_ERR(hw, "av_hwdevice_ctx_init failed\n");
goto error;
}
ret = CHECK_CU(cuCtxPopCurrent(&dummy));
if (ret < 0)
goto error;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
p->hwctx = (struct mp_hwdec_ctx) {
.driver_name = hw->driver->name,
.av_device_ref = hw_device_ctx,
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
};
hwdec_devices_add(hw->devs, &p->hwctx);
return 0;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
error:
av_buffer_unref(&hw_device_ctx);
CHECK_CU(cuCtxPopCurrent(&dummy));
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
return -1;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
}
static void cuda_uninit(struct ra_hwdec *hw)
{
struct priv_owner *p = hw->priv;
hwdec_devices_remove(hw->devs, &p->hwctx);
av_buffer_unref(&p->hwctx.av_device_ref);
if (p->decode_ctx && p->decode_ctx != p->display_ctx)
CHECK_CU(cuCtxDestroy(p->decode_ctx));
if (p->display_ctx)
CHECK_CU(cuCtxDestroy(p->display_ctx));
}
#undef CHECK_CU
#define CHECK_CU(x) check_cu((mapper)->owner, (x), #x)
static int mapper_init(struct ra_hwdec_mapper *mapper)
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
{
struct priv_owner *p_owner = mapper->owner->priv;
struct priv *p = mapper->priv;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
CUcontext dummy;
int ret = 0, eret = 0;
p->display_ctx = p_owner->display_ctx;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
int imgfmt = mapper->src_params.hw_subfmt;
mapper->dst_params = mapper->src_params;
mapper->dst_params.imgfmt = imgfmt;
mapper->dst_params.hw_subfmt = 0;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
mp_image_set_params(&p->layout, &mapper->dst_params);
struct ra_imgfmt_desc desc;
if (!ra_get_imgfmt_desc(mapper->ra, imgfmt, &desc)) {
MP_ERR(mapper, "Unsupported format: %s\n", mp_imgfmt_to_name(imgfmt));
return -1;
}
ret = CHECK_CU(cuCtxPushCurrent(p->display_ctx));
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
if (ret < 0)
return ret;
for (int n = 0; n < desc.num_planes; n++) {
const struct ra_format *format = desc.planes[n];
struct ra_tex_params params = {
.dimensions = 2,
.w = mp_image_plane_w(&p->layout, n),
.h = mp_image_plane_h(&p->layout, n),
.d = 1,
.format = format,
.render_src = true,
.src_linear = format->linear_filter,
};
mapper->tex[n] = ra_tex_create(mapper->ra, &params);
if (!mapper->tex[n]) {
ret = -1;
goto error;
}
GLuint texture;
GLenum target;
ra_gl_get_raw_tex(mapper->ra, mapper->tex[n], &texture, &target);
ret = CHECK_CU(cuGraphicsGLRegisterImage(&p->cu_res[n], texture, target,
CU_GRAPHICS_REGISTER_FLAGS_WRITE_DISCARD));
if (ret < 0)
goto error;
ret = CHECK_CU(cuGraphicsMapResources(1, &p->cu_res[n], 0));
if (ret < 0)
goto error;
ret = CHECK_CU(cuGraphicsSubResourceGetMappedArray(&p->cu_array[n], p->cu_res[n],
0, 0));
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
if (ret < 0)
goto error;
ret = CHECK_CU(cuGraphicsUnmapResources(1, &p->cu_res[n], 0));
if (ret < 0)
goto error;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
}
error:
eret = CHECK_CU(cuCtxPopCurrent(&dummy));
if (eret < 0)
return eret;
return ret;
}
static void mapper_uninit(struct ra_hwdec_mapper *mapper)
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
{
struct priv *p = mapper->priv;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
CUcontext dummy;
// Don't bail if any CUDA calls fail. This is all best effort.
CHECK_CU(cuCtxPushCurrent(p->display_ctx));
for (int n = 0; n < 4; n++) {
if (p->cu_res[n] > 0)
CHECK_CU(cuGraphicsUnregisterResource(p->cu_res[n]));
p->cu_res[n] = 0;
ra_tex_free(mapper->ra, &mapper->tex[n]);
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
}
CHECK_CU(cuCtxPopCurrent(&dummy));
}
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
static void mapper_unmap(struct ra_hwdec_mapper *mapper)
{
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
}
static int mapper_map(struct ra_hwdec_mapper *mapper)
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
{
struct priv *p = mapper->priv;
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
CUcontext dummy;
int ret = 0, eret = 0;
ret = CHECK_CU(cuCtxPushCurrent(p->display_ctx));
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
if (ret < 0)
return ret;
for (int n = 0; n < p->layout.num_planes; n++) {
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
CUDA_MEMCPY2D cpy = {
.srcMemoryType = CU_MEMORYTYPE_DEVICE,
.dstMemoryType = CU_MEMORYTYPE_ARRAY,
.srcDevice = (CUdeviceptr)mapper->src->planes[n],
.srcPitch = mapper->src->stride[n],
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
.srcY = 0,
.dstArray = p->cu_array[n],
.WidthInBytes = mp_image_plane_w(&p->layout, n) *
mapper->tex[n]->params.format->pixel_size,
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
.Height = mp_image_plane_h(&p->layout, n),
};
ret = CHECK_CU(cuMemcpy2D(&cpy));
if (ret < 0)
goto error;
}
hwdec/opengl: Add support for CUDA and cuvid/NvDecode Nvidia's "NvDecode" API (up until recently called "cuvid" is a cross platform, but nvidia proprietary API that exposes their hardware video decoding capabilities. It is analogous to their DXVA or VDPAU support on Windows or Linux but without using platform specific API calls. As a rule, you'd rather use DXVA or VDPAU as these are more mature and well supported APIs, but on Linux, VDPAU is falling behind the hardware capabilities, and there's no sign that nvidia are making the investments to update it. Most concretely, this means that there is no VP8/9 or HEVC Main10 support in VDPAU. On the other hand, NvDecode does export vp8/9 and partial support for HEVC Main10 (more on that below). ffmpeg already has support in the form of the "cuvid" family of decoders. Due to the design of the API, it is best exposed as a full decoder rather than an hwaccel. As such, there are decoders like h264_cuvid, hevc_cuvid, etc. These decoders support two output paths today - in both cases, NV12 frames are returned, either in CUDA device memory or regular system memory. In the case of the system memory path, the decoders can be used as-is in mpv today with a command line like: mpv --vd=lavc:h264_cuvid foobar.mp4 Doing this will take advantage of hardware decoding, but the cost of the memcpy to system memory adds up, especially for high resolution video (4K etc). To avoid that, we need an hwdec that takes advantage of CUDA's OpenGL interop to copy from device memory into OpenGL textures. That is what this change implements. The process is relatively simple as only basic device context aquisition needs to be done by us - the CUDA buffer pool is managed by the decoder - thankfully. The hwdec looks a bit like the vdpau interop one - the hwdec maintains a single set of plane textures and each output frame is repeatedly mapped into these textures to pass on. The frames are always in NV12 format, at least until 10bit output supports emerges. The only slightly interesting part of the copying process is that CUDA works by associating PBOs, so we need to define these for each of the textures. TODO Items: * I need to add a download_image function for screenshots. This would do the same copy to system memory that the decoder's system memory output does. * There are items to investigate on the ffmpeg side. There appears to be a problem with timestamps for some content. Final note: I mentioned HEVC Main10. While there is no 10bit output support, NvDecode can return dithered 8bit NV12 so you can take advantage of the hardware acceleration. This particular mode requires compiling ffmpeg with a modified header (or possibly the CUDA 8 RC) and is not upstream in ffmpeg yet. Usage: You will need to specify vo=opengl and hwdec=cuda. Note that hwdec=auto will probably not work as it will try to use vdpau first. mpv --hwdec=cuda --vo=opengl foobar.mp4 If you want to use filters that require frames in system memory, just use the decoder directly without the hwdec, as documented above.
2016-09-04 22:23:55 +00:00
error:
eret = CHECK_CU(cuCtxPopCurrent(&dummy));
if (eret < 0)
return eret;
return ret;
}
const struct ra_hwdec_driver ra_hwdec_cuda = {
.name = "cuda-nvdec",
.imgfmts = {IMGFMT_CUDA, 0},
.priv_size = sizeof(struct priv_owner),
.init = cuda_init,
.uninit = cuda_uninit,
.mapper = &(const struct ra_hwdec_mapper_driver){
.priv_size = sizeof(struct priv),
.init = mapper_init,
.uninit = mapper_uninit,
.map = mapper_map,
.unmap = mapper_unmap,
},
};