We need to support hardware/drivers which do not support ARGB8888 in
their primary plane.
We also use p->primary_plane_format when creating the gbm surface, to
make sure it always matches (in actuality there should be little
difference).
Passing in an invalid DRM overlay id with the --drm-overlay option would
cause drmplane to be freed twice: once in the for-loop and once at the
error-handler label fail.
Solve by setting drmpanel to NULL after freeing it.
Also the 'return false' statement after the error handler label should
probably be 'return NULL', given that the return type of
drm_atomic_create_context returns a pointer.
vo_x11 and vo_xv need this. According to the Linux manpage, all involved
functions are POSIX-2001 anyway. (I just assumed they were not, because
they're mostly System V UNIX legacy garbage.)
Finally get rid of all the HWDEC_* things, and instead rely on the
libavutil equivalents. vdpau still uses a shitty hack, but fuck the
vdpau code.
Remove all the now unneeded remains. The vdpau preemption thing was not
unused anymore; if someone cares this could probably be restored.
With the recent changes, mpv's internal mechanisms got synced to
libavcodec's once more. Some things are still needed for filters (until
the mechanism gets replaced), but there's no need to require other hwdec
methods to use these fields. So remove them where they are unnecessary.
Also fix some minor leaks in the dxva2 backends, and set the driver_name
field in the Apple ones. Untested on Apple crap.
It makes more sense to have it in the general video directory (along
with vdpau.c and vaapi.c), since the decoder source files don't even
access it anymore.
The testing_only field is not referenced anymore with vaglx removed and
the previous commit dropping all uses.
The ra_hwdec_driver.api field became unused with the previous commit,
but all hwdec interop drivers still initialized it.
Since this touches highly OS-specific code, build regressions are
possible (plus the previous commit might break hw decoding at runtime).
At least hwdec_cuda.c still used the .api field, other than initializing
it.
Make the VO<->decoder interface capable of supporting multiple hwdec
APIs at once. The main gain is that this simplifies autoprobing a lot.
Before this change, it could happen that the VO loaded the "wrong" hwdec
API, and the decoder was stuck with the choice (breaking hw decoding).
With the change applied, the VO simply loads all available APIs, so
autoprobing trickery is left entirely to the decoder.
In the past, we were quite careful about not accidentally loading the
wrong interop drivers. This was in part to make sure autoprobing works,
but also because libva had this obnoxious bug of dumping garbage to
stderr when using the API. libva was fixed, so this is not a problem
anymore.
The --opengl-hwdec-interop option is changed in various ways (again...),
and renamed to --gpu-hwdec-interop. It does not have much use anymore,
other than debugging. It's notable that the order in the hwdec interop
array ra_hwdec_drivers[] still matters if multiple drivers support the
same image formats, so the option can explicitly force one, if that
should ever be necessary, or more likely, for debugging. One example are
the ra_hwdec_d3d11egl and ra_hwdec_d3d11eglrgb drivers, which both
support d3d11 input.
vo_gpu now always loads the interop lazily by default, but when it does,
it loads them all. vo_opengl_cb now always loads them when the GL
context handle is initialized. I don't expect that this causes any
problems.
It's now possible to do things like changing between vdpau and nvdec
decoding at runtime.
This is also preparation for cleaning up vd_lavc.c hwdec autoprobing.
It's another reason why hwdec_devices_request_all() does not take a
hwdec type anymore.
nvdec aka cuvid aka cuda should work much better than vdpau, and support
newer codecs (such as vp9), and more advanced surface formats (like 10
bit).
This requires moving the d3d hwaccels in the autoprobe order, since on
Windows, d3d decoding should be preferred over nvidia proprietary stuff.
Users of older drivers will need to force --hwdec=vdpau, since it could
happen that the vo_gpu cuda hwdec interop loads (so the vdpau interop is
not loaded), but the hwdec itself doesn't work.
I expect this does not break AMD (which still needs vdpau for vo_gpu
interop, until libva is fixed so it can fully support AMD).
This has stopped being useful a long time ago, and it's the only GPL
source file in the vo_gpu source directories. Recently it wasn't even
loaded at all, unless you forced loading it.
The D3D11_CREATE_DEVICE_BGRA_SUPPORT flag doesn't enable support for
BGRA textures. BGRA textures will be supported whether or not the flag
is passed. The flag just fails device creation if they are not supported
as an API convenience for programs that need BGRA textures, such as
programs that use D2D or D3D9 interop. We can handle devices without
BGRA support fine, so don't bother with the flag.
For consistency with already implemented shcore.dll
function loading in w32->api:
Moved loading of imm32.dll to w32_api_load, and declare
pImmDisableIME function pointer in the w32->api struct.
Removed unloading of imm32.dll.
Seems like the last refactor to this code broke playing flipped images,
at least with --opengl-pbo --gpu-api=opengl.
Flipping is rather a shitmess. The main problem is that OpenGL does not
support flipped uploading. The original vo_gl implementation considered
it important to handle the flipped case efficiently, so instead of
uploading the image line by line backwards, it uploaded it flipped, and
then flipped it in the renderer (basically the upload path ignored the
flipping). The ra code and backends probably have an insane and
inconsistent mix of semantics, so fix this by never passing it flipped
images in the first place.
In the future, the backends should probably support flipped images
directly.
Fixes#5097.
Like the manual says, this is technically undefined behaviour. See:
https://msdn.microsoft.com/en-us/library/windows/desktop/ff476085.aspx
In particular, MSDN says texture arrays created with the BIND_DECODER
flag cannot be used with CreateShaderResourceView, which means they
can't be sampled through SRVs like normal Direct3D textures. However,
some programs (Google Chrome included) do this anyway for performance
and power-usage reasons, and it appears to work with most drivers.
Older AMD drivers had a "bug" with zero-copy decoding, but this appears
to have been fixed. See #3255, #3464 and http://crbug.com/623029.
The shader cache in ra_d3d11 caches the result of shaderc, crossc and
the D3DCompiler DLL, so it should be invalidated when any of those
components are updated. This should make the cache more reliable, which
makes it safer to enable gpu-shader-cache-dir. Shader compilation is
slow with D3D11, so gpu-shader-cache-dir is highly necessary
Some shaders take a _long_ time to compile with the Direct3D compiler.
The ANGLE backend had this problem too, to a certain extent. Logging
should help identify which shaders cause long stalls and could also help
with benchmarking ways of reducing compile times.
ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is
then translated to HLSL. This means it always exposes the same GLSL
version that the SPIR-V compiler supports (4.50 for shaderc/glslang.)
Despite claiming to support GLSL 4.50, some features that are tied to
the GLSL version in OpenGL are not supported by ra_d3d11 when targeting
legacy Direct3D feature levels.
This includes two features that mpv relies on:
- Reading from gl_FragCoord in the fragment shader (requires FL 10_0)
- textureGather from any texture component (requires FL 11_0)
These features have been exposed as new RA caps.
This is a new RA/vo_gpu backend that uses Direct3D 11. The GLSL
generated by vo_gpu is cross-compiled to HLSL with SPIRV-Cross.
What works:
- All of mpv's internal shaders should work, including compute shaders.
- Some external shaders have been tested and work, including RAVU and
adaptive-sharpen.
- Non-dumb mode works, even on very old hardware. Most features work at
feature level 9_3 and all features work at feature level 10_0. Some
features also work at feature level 9_1 and 9_2, but without high-bit-
depth FBOs, it's not very useful. (Hardware this old is probably not
fast enough for advanced features anyway.)
Note: This is more compatible than ANGLE, which requires 9_3 to work
at all (GLES 2.0,) and 10_1 for non-dumb-mode (GLES 3.0.)
- Hardware decoding with D3D11VA, including decoding of 10-bit formats
without truncation to 8-bit.
What doesn't work / can be improved:
- PBO upload and direct rendering does not work yet. Direct rendering
requires persistent-mapped PBOs because the decoder needs to be able
to read data from images that have already been decoded and uploaded.
Unfortunately, it seems like persistent-mapped PBOs are fundamentally
incompatible with D3D11, which requires all resources to use driver-
managed memory and requires memory to be unmapped (and hence pointers
to be invalidated) when a resource is used in a draw or copy
operation.
However it might be possible to use D3D11's limited multithreading
capabilities to emulate some features of PBOs, like asynchronous
texture uploading.
- The blit() and clear() operations don't have equivalents in the D3D11
API that handle all cases, so in most cases, they have to be emulated
with a shader. This is currently done inside ra_d3d11, but ideally it
would be done in generic code, so it can take advantage of mpv's
shader generation utilities.
- SPIRV-Cross is used through a NIH C-compatible wrapper library, since
it does not expose a C interface itself.
The library is available here: https://github.com/rossy/crossc
- The D3D11 context could be made to support more modern DXGI features
in future. For example, it should be possible to add support for
high-bit-depth and HDR output with DXGI 1.5/1.6.
Backported from @haasn's change to libplacebo, except in the current RA,
there's nothing to indicate an ra_format can be bound as a storage
image, so there's no way to force all of these formats to have a
glsl_format. Instead, the layout qualifier will be removed if
glsl_format is NULL.
This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while
loading float values from unorm images often works as expected, it's
technically undefined behaviour, and in Windows 10, it will cause the
debug layer to spam the log with error messages. Also, apparently in
GLSL, the format name must match the image's format exactly (but in
Direct3D, it just has to have the same component type.)
Backported from @haasn's change to libplacebo. More flexible than the
previous "shared || non-shared" distinction. The extra flexibility is
needed for Direct3D 11, but it also doesn't hurt code-wise.
For some reason vo_lavc's draw_image can buffer the frame and encode it
only later. Also, there is logic for rendering the OSD (i.e. subtitles)
only when needed.
In theory this can lead to subtitles being pruned before it tries to
render them (as the subtitle logic doesn't know that the VO still needs
them later), although this probably never happens in reality.
The worse issue, that actually happened, is that if the last frame gets
buffered, it attempts to render subtitles in the uninit callback. At
this point, the subtitle decoder is already torn down and all subtitles
removed, thus it will draw nothing. This didn't always happen. I'm not
sure why - potentially in the working cases, the frame wasn't buffered.
Since this logic doesn't have much worth, except a minor performance
advantage if frames with subtitles are dropped, just remove it.
Hopefully fixes#4689.
Repeating frames (for display-sync) is not supposed to render the entire
frame again. When using hardware decoding, it unfortunately did: the
renderer uses the frame ID to check whether the frame data changed, and
unmapping the hwdec frame clears it.
Essentially reverts commit 761eeacf54. Back then I probably
thought it would be a good idea to release the hwdec image quickly in
order to return it to the decoder, but they're referenced anyway.
This should increase the performance and reduce GPU work.
Normally such code is didsabled by have_mglsl==false in
check_gl_features(), but apparently not this one.
Just fix it. Seems also more readable.
Fixes#5069.
Apparently this is required, but it doesn't check for it. To be fair,
this was tested by creating a compatibility context and pretending it's
GL 2.1. GL_ARB_shader_storage_buffer_object actually requires GL 4.0 or
up, but GL_ARB_uniform_buffer_object requires only GL 2.0.
vo_gpu.c will call gl_video_icc_auto_enabled() to check whether it
should retrieve the ICC profile. But the value returned by this function
will be outdated, because gl_video_update_options() is not called yet.
Change the order of function calls so that this is done after updating
the options.
(This is fairly chaotic, but I guess this code will be refactored a
dozen of times anyway in the future.)
This is just a dumb consequence of HWDEC_ types somehow being part of
both decoder and VO. Obviously, the VO should only care about supporting
specific hardware surface types or providing specific device types, but
until they are separated, stupid unintuitive mismatches will occur.
See manpage additions.
(In ffmpeg-mpv and Libav, this is still called "cuvid". Libav won't work
yet, because it has no frame params support yet, but this could get
fixed soon.)
params->rc was ignored in the calculation for the buffer size. I fucking
hate this stupid ra_tex_upload signature where *rc is randomly relevant
or not.
Coverity complains about this, but it's probably a false positive.
Anyway, rewrite it in a slightly more readable way. Now it's more
obvious that it is correct.
Comparing mpv's implementation against the ACES ODR reference samples
and algorithms, it seems like they're happy desaturating highlights
_way_ more aggressively than mpv currently does. And indeed, looking at
some example clips like The Redwoods (which is actually well-mastered),
the current desaturation produces unnatural-looking brightness fringes
where the sky meets the treeline.
Adjust the algorithm to make it apply to a much larger, more gradual
brightness region; and change the interpretation of the parameter. As a
bonus, the new parameter is actually sanely scaled (higher values = more
desaturation). Also, make it scale based on the signal level instead of
the luminance, to avoid under-desaturating bright blues.
This commit allows to use the AV_PIX_FMT_DRM_PRIME newly introduced
format in ffmpeg that allows decoders to provide an AVDRMFrameDescriptor
struct.
That struct holds dmabuf fds and information allowing zerocopy rendering
using KMS / DRM Atomic.
This has been tested on RockChip ROCK64 device.
Since we divide by it in a couple of places and compositors can be crazy,
its better to be safe than sorry.
Also checks cursor spawn durinig init (pointless since it does again on
cursor entry but its more correct).