This allows the new GPU screenshot functionality introduced in
9f595f3a80ee to work with the D3D11 backend. It replaces the old window
screenshot functionality, which was shared between D3D11 and ANGLE. The
old code can be removed, since it's not needed by ANGLE anymore either.
Get rid of the old vf.c code. Replace it with a generic filtering
framework, which can potentially handle more than just --vf. At least
reimplementing --af with this code is planned.
This changes some --vf semantics (including runtime behavior and the
"vf" command). The most important ones are listed in interface-changes.
vf_convert.c is renamed to f_swscale.c. It is now an internal filter
that can not be inserted by the user manually.
f_lavfi.c is a refactor of player/lavfi.c. The latter will be removed
once --lavfi-complex is reimplemented on top of f_lavfi.c. (which is
conceptually easy, but a big mess due to the data flow changes).
The existing filters are all changed heavily. The data flow of the new
filter framework is different. Especially EOF handling changes - EOF is
now a "frame" rather than a state, and must be passed through exactly
once.
Another major thing is that all filters must support dynamic format
changes. The filter reconfig() function goes away. (This sounds complex,
but since all filters need to handle EOF draining anyway, they can use
the same code, and it removes the mess with reconfig() having to predict
the output format, which completely breaks with libavfilter anyway.)
In addition, there is no automatic format negotiation or conversion.
libavfilter's primitive and insufficient API simply doesn't allow us to
do this in a reasonable way. Instead, filters can use f_autoconvert as
sub-filter, and tell it which formats they support. This filter will in
turn add actual conversion filters, such as f_swscale, to perform
necessary format changes.
vf_vapoursynth.c uses the same basic principle of operation as before,
but with worryingly different details in data flow. Still appears to
work.
The hardware deint filters (vf_vavpp.c, vf_d3d11vpp.c, vf_vdpaupp.c) are
heavily changed. Fortunately, they all used refqueue.c, which is for
sharing the data flow logic (especially for managing future/past
surfaces and such). It turns out it can be used to factor out most of
the data flow. Some of these filters accepted software input. Instead of
having ad-hoc upload code in each filter, surface upload is now
delegated to f_autoconvert, which can use f_hwupload to perform this.
Exporting VO capabilities is still a big mess (mp_stream_info stuff).
The D3D11 code drops the redundant image formats, and all code uses the
hw_subfmt (sw_format in FFmpeg) instead. Although that too seems to be a
big mess for now.
f_async_queue is unused.
This enables DXVA2 hardware decoding with ra_d3d11. It should be useful
for Windows 7, where D3D11VA is not available. Images are transfered
from D3D9 to D3D11 using D3D9Ex surface sharing[1].
Following Microsoft's recommendations, it uses a queue of shared
surfaces, similar to Microsoft's ISurfaceQueue. This will hopefully
prevent surface sharing from impacting parallelism and allow multiple
D3D11 frames to be in-flight at once.
[1]: https://msdn.microsoft.com/en-us/library/windows/desktop/ee913554.aspx
In a lost device scenario, resize() will fail and p->backbuffer will be
NULL. We can't recover from lost devices yet, but we should still check
for a NULL backbuffer in start_frame() rather than crashing.
Also remove a NULL check for p->swapchain. This was a red herring, since
p->swapchain never becomes NULL in an error condition, but p->backbuffer
actually does.
This should fix the crash in #5320, but it doesn't fix the underlying
reason for the lost device (which is probably a driver bug.)
Apparently some Intel drivers have a bug where copying from staging
buffers to constant buffers does not work. We used to keep a copy of the
buffer data in a staging buffer to enable partial constant buffer
updates. To work around this bug, keep the copy in talloc-allocated
system memory instead.
There doesn't seem to be any noticable performance difference from
keeping the copy in system memory. Our cbuffers are probably too small
for it to matter anyway.
See also: https://crbug.com/593024Fixes#5293
Finally get rid of all the HWDEC_* things, and instead rely on the
libavutil equivalents. vdpau still uses a shitty hack, but fuck the
vdpau code.
Remove all the now unneeded remains. The vdpau preemption thing was not
unused anymore; if someone cares this could probably be restored.
It makes more sense to have it in the general video directory (along
with vdpau.c and vaapi.c), since the decoder source files don't even
access it anymore.
The testing_only field is not referenced anymore with vaglx removed and
the previous commit dropping all uses.
The ra_hwdec_driver.api field became unused with the previous commit,
but all hwdec interop drivers still initialized it.
Since this touches highly OS-specific code, build regressions are
possible (plus the previous commit might break hw decoding at runtime).
At least hwdec_cuda.c still used the .api field, other than initializing
it.
Like the manual says, this is technically undefined behaviour. See:
https://msdn.microsoft.com/en-us/library/windows/desktop/ff476085.aspx
In particular, MSDN says texture arrays created with the BIND_DECODER
flag cannot be used with CreateShaderResourceView, which means they
can't be sampled through SRVs like normal Direct3D textures. However,
some programs (Google Chrome included) do this anyway for performance
and power-usage reasons, and it appears to work with most drivers.
Older AMD drivers had a "bug" with zero-copy decoding, but this appears
to have been fixed. See #3255, #3464 and http://crbug.com/623029.
The shader cache in ra_d3d11 caches the result of shaderc, crossc and
the D3DCompiler DLL, so it should be invalidated when any of those
components are updated. This should make the cache more reliable, which
makes it safer to enable gpu-shader-cache-dir. Shader compilation is
slow with D3D11, so gpu-shader-cache-dir is highly necessary
Some shaders take a _long_ time to compile with the Direct3D compiler.
The ANGLE backend had this problem too, to a certain extent. Logging
should help identify which shaders cause long stalls and could also help
with benchmarking ways of reducing compile times.
ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is
then translated to HLSL. This means it always exposes the same GLSL
version that the SPIR-V compiler supports (4.50 for shaderc/glslang.)
Despite claiming to support GLSL 4.50, some features that are tied to
the GLSL version in OpenGL are not supported by ra_d3d11 when targeting
legacy Direct3D feature levels.
This includes two features that mpv relies on:
- Reading from gl_FragCoord in the fragment shader (requires FL 10_0)
- textureGather from any texture component (requires FL 11_0)
These features have been exposed as new RA caps.
This is a new RA/vo_gpu backend that uses Direct3D 11. The GLSL
generated by vo_gpu is cross-compiled to HLSL with SPIRV-Cross.
What works:
- All of mpv's internal shaders should work, including compute shaders.
- Some external shaders have been tested and work, including RAVU and
adaptive-sharpen.
- Non-dumb mode works, even on very old hardware. Most features work at
feature level 9_3 and all features work at feature level 10_0. Some
features also work at feature level 9_1 and 9_2, but without high-bit-
depth FBOs, it's not very useful. (Hardware this old is probably not
fast enough for advanced features anyway.)
Note: This is more compatible than ANGLE, which requires 9_3 to work
at all (GLES 2.0,) and 10_1 for non-dumb-mode (GLES 3.0.)
- Hardware decoding with D3D11VA, including decoding of 10-bit formats
without truncation to 8-bit.
What doesn't work / can be improved:
- PBO upload and direct rendering does not work yet. Direct rendering
requires persistent-mapped PBOs because the decoder needs to be able
to read data from images that have already been decoded and uploaded.
Unfortunately, it seems like persistent-mapped PBOs are fundamentally
incompatible with D3D11, which requires all resources to use driver-
managed memory and requires memory to be unmapped (and hence pointers
to be invalidated) when a resource is used in a draw or copy
operation.
However it might be possible to use D3D11's limited multithreading
capabilities to emulate some features of PBOs, like asynchronous
texture uploading.
- The blit() and clear() operations don't have equivalents in the D3D11
API that handle all cases, so in most cases, they have to be emulated
with a shader. This is currently done inside ra_d3d11, but ideally it
would be done in generic code, so it can take advantage of mpv's
shader generation utilities.
- SPIRV-Cross is used through a NIH C-compatible wrapper library, since
it does not expose a C interface itself.
The library is available here: https://github.com/rossy/crossc
- The D3D11 context could be made to support more modern DXGI features
in future. For example, it should be possible to add support for
high-bit-depth and HDR output with DXGI 1.5/1.6.