This file claims to be based on the "MPlayer VA-API patch", but this is
untrue. Only some glue code was copied from hwdec_vaglx.c, and this glue
code was never in MPlayer or the MPlayer VA-API patch in any form, and
instead part of the mpv-original way we do hardware decoding OpenGL
interop. The EGL interop method didn't exist at the time the MPlayer
VA-API patch was created either.
GLSL in GLES 2.0 did not have line continuation in its preprocessor.
This broke shader compilation. It also broke subtitle rendering in
vo_rpi, which reuses some of the OpenGL code.
Line continuation was finally added in GLES 3.0, which is perhaps the
reason why ANGLE accepted it.
Untested, but should be fine. Broken by commit 0a0bb905.
Also fix the include statement in context_rpi.c, which caused another
compilation failure. Also untested. (Because I'm lazy.)
Fixes#2638.
gcc 4.8 does not support C11 thread local storage. This is a bit
annoying, so add a hack to use the gcc specific __thread extension if
C11 TLS is not available.
(This is used for the extremely silly mpv-internal way hwdec modules
access some platform specific handles. Disabling it simply made
hwdec_vaegl.c always fail initialization.)
Fixes#2631.
Add a "blend-tiles" choice to the "alpha" sub-option. This is pretty
simplistic and uses the GL raster position to derive the tiles. A weird
consequence is that using --vo=opengl and --vo=opengl-hq gives different
scaling behavior (screenspace pixel size vs. source video pixel size
16x16 tiles), but it seems we don't have easy access to the original
texture coordinates. Using the rasterpos is probably simpler.
Make this option the default.
long is 64 bits on x86_64 on Linux, which means the check for the corner
case of computing the depth mask is wrong.
Also, X11 compositors seem to expect premultiplied alpha.
Since alpha isn't pulled through the colormatrix (maybe it should?), we
reject alpha formats with odd sizes, such as yuva444p10.
But the awful tex_mul path in vo_opengl does this anyway (at some points
even explicitly), which means there will be a subtle difference in
handling of 16 bit yuv alpha formats. Make it consistent and always
apply the range adjustment to the alpha component. This also means odd
sizes like 10 bit are supported now.
This assumes alpha uses the same "shifted" range as the yuv color
channels for depths larger than 8 bit. I'm not sure whether this is
actually the case.
Now common.c only contains the code for the function loader, while
context.c contains the backend loader/dispatcher.
Not calling it "backend.c", because the central struct is called
MPGLContext.
This is used for dithering, although I'm not aware of anyone who got
higher than 8 bit depth support to work on Linux.
Also put this into egl_helpers.c. Since EGL is pseudo-portable at best I
have no hope that the EGL context creation code in all the backends can
be fully shared. But some self-contained functionality can definitely be
shared.
Store the determined framebuffer depth in struct GL instead of
MPGLContext. This means gl_video_set_output_depth() can be removed, and
also justifies adding new fields describing framebuffer/backend
properties to struct GL instead of having to add more functions just to
shovel the information around.
Keep in mind that mpgl_load_functions() will wipe struct GL, so the
new fields must be set before calling it.
Although the source file is named w32.c, the backend name was "win"
until recently. It was accidentally changed to "w32"; fix it.
Fixes#2608 (the manual is correct).
When a Direct3D 9Ex device fails to reset, it gets put into the lost
state, so set the lost_device flag and don't attempt to present until
the device moves out of that state. Failure to recreate the size-
dependent objects should set lost_device as well, since we shouldn't try
to present in that state.
Also, it looks like I was too eager to remove code that sets priv
members to NULL and I accidentally removed some that was needed.
Direct3D doesn't like 0-sized swapchain dimensions, even when those
dimensions are automatically set. Manually set them to a size that isn't
zero instead.
Why not.
Also, instead of disabling hue/saturation for RGB, just don't apply
them. (They don't make sense for conversion matrixes other than YUV, but
I can't be bothered to keep the fine-grained disabling of UI controls
either.)
WGL_NV_DX_interop is widely supported by Nvidia and AMD drivers. It
allows a texture to be shared between Direct3D and WGL, so that
rendering can be done with WGL and presentation can be done with
Direct3D. This should allow us to work around some persistent WGL
issues, such as dropped frames with some driver/OS combos, drivers that
buffer frames to increase performance at the cost of latency, and the
inability to disable exclusive fullscreen mode when using WGL to render
to a fullscreen window.
The addition of a DX_interop backend might also enable some cool
Direct3D-specific enhancements in the future, such as using the
GetPresentStatistics API to get accurate frame presentation timestamps.
Note that due to a driver bug, this backend is currently broken on
Intel. It will appear to work as long as the window is not resized too
often, but after a few changes of size it will be unable to share the
newly created renderbuffer with GL. See:
https://software.intel.com/en-us/forums/graphics-driver-bug-reporting/topic/562051
With default setting, the matrix for fruit dithering requires 12 bits
precision (values from 0/4096 to 4095/4096). But 16-bit float
provides only 10 bits. In addition, when `dither-size-fruit=8` is
set, 16 bits are required from the texture format.
Fix this by attempting to use 16 bit integer texture first. This is
still not precise, but should be better than using a half float.
The recent LUT adjustment changes broke interpolation.
The concatenation of the shader stages is a bit messy, and it seems like
sampler_prelude is not a good place to add this macro. Always add the
macro to every shader instead. (While this doesn't seem too elegant,
this isn't too inelegant either, and goes these problems out of the
way.)
The computation of the tex_mul variable was broken in multiple ways.
This variable is used e.g. by debanding for moving expansion of 10 bit
fixed-point input to normalized range to another stage of processing.
One obvious bug was that the rgb555 pixel format was broken. This format
has component_bits=5, but obviously it's already sampled in normalized
range, and does not need expansion. The tex_mul-free code path avoids
this by not using the colormatrix. (The code was originally designed to
work around dealing with the generally complicated pixel formats by only
using the colormatrix in the YUV case.)
Another possible bug was with 10 bit input. It expanded the input by
bringing the [0,2^10) range to [0,1], and then treating the expanded
input as 16 bit input. I didn't bother to check what this actually
computed, but it's somewhat likely it was wrong anyway. Now it uses
mp_get_csp_mul(), and disables expansion when computing the YUV matrix.
It turns out that with accurate lookup we can decrease the
default size of texture now. Do it to compensate the performance
loss introduced by the LUT_POS macro.
Define a macro to correct the coordinate for lookup texture. Cache
the corrected coordinate for 1D filter and use mix() to minimize the
performance impact.
If the sampling point is placed diagonally, the radius difference
could be as large as sqrt(2.0). And a loosened check with (radius - 1)
would potentially include pixels out of the range.
Fix the check to handle those corner case properly to avoid
unnecessary texture lookup and improve the performance a bit.
There are claims that nnedi3.c doesn't constitute its own new
implementation, but is derived from existing HLSL or OpenCL shaders
distributed under the LGPLv3 license.
Until these are resolved, do the "correct" thing and require
--enable-gpl3 to build nnedi.
At least I hope so.
Deriving the duration from the pts was not really correct. It doesn't
include speed adjustments, and becomes completely wrong of the user e.g.
changes the playback speed by a huge amount. Pass through the accurate
duration value by adding a new vo_frame field.
The value for vsync_offset was not correct either. We don't need the
error for the next frame, but the error for the current one. This wasn't
noticed because it makes no difference in symmetric cases, like 24 fps
on 60 Hz.
I'm still not entirely confident in the correctness of this, but it sure
is an improvement.
Also, remove the MP_STATS() calls - they're not really useful to debug
anything anymore.
This was just converting back and forth between int64_t/microseconds and
double/seconds. Remove this stupidity. The pts/duration fields are still
in microseconds, but they have no meaning in the display-sync case (also
drop printing the pts field from opengl/video.c - it's always 0).
This is a hack, but unfortunately the DwmGetCompositionTimingInfo
heuristic does not work in all cases (with multiple-monitors on Windows
8.1 and even with a single monitor in Windows 10.) See the comment in
mp_w32_is_in_exclusive_mode() for more details.
It should go without saying that if any better method of doing this
reveals itself, this hack should be dropped.
The D3D9 backend does not support GLES 3, which makes it pretty useless.
But it still might be a legitimate replacement of vo_direct3d.c on
Windows 7 machines.
Note that we could just use:
eglGetDisplay(EGL_D3D11_ELSE_D3D9_DISPLAY_ANGLE)
But for now I'll leave the old code. Maybe this can exclude use of
software rendering backends (EGL_PLATFORM_ANGLE_DEVICE_TYPE_WARP_ANGLE).
Since I'm not sure, I won't touch it.
Running mpv with default config will now pick up ANGLE by default. Since
some think ANGLE is still not good enough for hq features, extend the
"es" option to reject GLES backends, and add to to the opengl-hq preset.
One consequence is that mpv will by default use libswscale to convert
10 bit video to 8 bit, before it reaches the VO.
I decided that I actually can't stand how vo_opengl unnecessarily puts
the video through 3 shader stages (instead of 1). Thus, what was meant
to be a fallback for weak OpenGL implementations, the dumb-mode, now
becomes default if the user settings allow it.
The code required to check for the settings isn't so wild, so I guess
it's manageable. I still hope that one day, our rendering logic can
generate ideal shader stages for this case too.
Note that in theory, dumb-mode could be reenabled at runtime due to a
color management 3D LUT being set, so a separate dumb_mode field is
required. The dumb-mode option can't just be overwritten.
Unfortunately, color management can still not work, because no GLES
version specified so far support fixed-point 16 bit textures. Maybe
we could use integer textures, but these don't support filtering.
Using float textures would be another possibility.
Polar scalers use 1D textures, because they're slightly faster on some
GPUs than 2D textures. But 2D textures work too, so add support for
them.
Allows using these scalers with ANGLE.
Just like commit f9a2fc59. There are probably some more such cases.
The vec2 constructor calls are probably fine, but don't bother with
confusing inconsistencies.
While desktop GL's glTexImage2D() essentially accepts anything, GLES is
much stricter. The combination of allowed formats/types/internal formats
is exactly specified. The GLES 3.0.4 specification lists them in
table 3.2. (The ANGLE API validation code references this table.)
The table could probably be extended into a general declarative table
about GL formats covering other uses, but this would be a big
non-trivial project, so don't bother and accept a minor degree
of duplication with other tables.
Note that the format and type do (or should) not matter here, because
no image data is transferred to the GPU.
We don't only need float textures for advanced scaling - we also need
them to be filterable with GL_LINEAR. On GLES, this is not supported
until GLES 3.1, but some implementation expose them with extensions.
This makes advanced scaling sort-of work for GLES 3.0 (on ANGLE). It's
still not very advisable, as 8 bits might not be enough to avoid
debanding. (Ironically, the debanding filter can be enabled, and does
not raise any GL errors - but probably doesn't do anything useful.)
Turns out glGetTexLevelParameter, which is missing in ANGLE, is a
GLES3.1 function. Removing it from the list of core GLES3 functions
makes ANGLE work in GLES3 mode.
ANGLE is a GLES2 implementation for Windows that uses Direct3D 11 for
rendering, enabling vo_opengl to work on systems with poor OpenGL
drivers and bypassing some of the problems with native GL, such as VSync
in fullscreen mode.
Unfortunately, using GLES2 means that most of vo_opengl's advanced
features will not work, however ANGLE is under rapid development and
GLES3 support is supposed to be coming soon.
Something goes wrong somewhere. Don't bother, it's only needed for
compatibility with our absolute baseline (GL 2.1/GLES 2).
On the other hand, we can process nv12 formats just fine.
For the sake of vaapi interop, we want to use EGL, but on the other
hand, but because driver developers are full of shit, vdpau interop will
not work on EGL (even if the driver supports EGL). The latter happens
with both nvidia and AMD Mesa drivers.
Additionally, EGL vaapi interop support can apparently only detected at
runtime by actually using it. While hwdec_vaegl.c already does this, it
would require initializing libva on _every_ system, which will cause
libav to print an unpreventable bullshit message to the terminal.
Try to counter these huge loads of bullshit by adding more fucking
bullshit.
We want the following behavior:
- VO probed, backend probed: only accept non-sw, fail completely
otherwise
- VO forced, backend probed: use the first non-sw, or if none is found,
fall back to the first working sw backend
- VO probed, backend forced: (I don't care about this case)
- VO forced, backend forced: just use that backend
Also, on backend probe failure the vo->probed field was left in its old
state.
In the display-sync, non-interpolation case, and if the display refresh
rate is higher than the video framerate, we duplicate display frames by
rendering exactly the same screen again. The redrawing is cached with a
FBO to speed up the repeat.
Use glBlitFramebuffer() instead of another shader pass. It should be
faster.
For some reason, post-process was run again on each display refresh.
Stop doing this, which should also be slightly faster. The only
disadvantage is that temporal dithering will be run only once per video
frame, but I can live with this.
One aspect is messy: clearing the background is done at the start on the
target framebuffer, so to avoid clearing twice and duplicating the code,
only copy the part of the framebuffer that contains the rendered video.
(Which also gets slightly messy - needs to compensate for coordinate
system flipping.)
The nnedi3 prescaler requires a normalized range to work properly,
but the original implementation did the range normalization after
the first step of the first pass. This could lead to severe quality
degradation when debanding is not enabled for NNEDI3.
Fix this issue by passing `tex_mul` into the shader code.
Fixes#2464
Pick the correct GLSL version from the GL_SHADING_LANGUAGE_VERSION
string. Might be somewhat questionable, as we expect the minor version
number not to have leading 0s.
Should help with cases when the reported GLSL version is much higher
than the equivalent of the reported GL version. This problem was
observed in combination with GL_ARB_uniform_buffer_object, which
can't be used if the declared GLSL version is too low.
Notes:
- Unfortunately the only way to talk to EGL from within DRM I could find
involves linking with GBM (generic buffer management for Mesa.)
Because of this, I'm pretty sure it won't work with proprietary NVidia
drivers, but then again, last time I checked NVidia didn't offer
proper screen resolution for VT.
- VT switching doesn't seem to work at all. It's worth mentioning that
using vo_drm before introduction of VT switcher had an anomaly where
user could switch to another VT and input text to it, while video
played on top of that VT. However, that isn't the case with drm_egl:
I can't switch to other VT during playback like this. This makes me
think that it's either a limitation coming from my firmware or from
EGL/KMS itself rather than a bug with my code. Nonetheless, I still
left (untestable) VT switching code in place, in case it's useful to
someone else.
- The mode_id, connector_id and device_path should be configurable for
power users and people who wish to watch videos on nonprimary screen.
Unfortunately I didn't see anything that would allow OpenGL backends
to register their own set of options. At the same time, adding them to
global namespace is pointless.
- A few dozens of lines could be shared with vo_drm (setting up VT
switching, most of code behind page flipping). I don't have any strong
opinion on this.
- Sometimes I get minor visual glitches. I'm not sure if there's a race
condition of some sort, unitialized variable (doubtful), or if it's
buggy driver. (I'm using integrated Intel HD Graphics 4400 with Mesa)
- .config and .control are very minimal.
Signed-off-by: wm4 <wm4@nowhere>
glXCreateContextAttribsARB() by design can throw some X11 errors. We
ignore these, but we generally still print error messages to the
terminal. This was confusing/annoying users, so silence it. The stupid
part is that the Xlib error handler is global, so we have to be slightly
careful here.
Commit 27dc834f added it as such.
Also remove the check for glUniformBlockBinding() - it's part of an
extension, and the check glGetUniformBlockIndex() already checks whether
the extension is fully available.
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes#2230
Add the Super-xBR filter for image doubling, and the prescaling framework
to support it.
The shader code was ported from MPDN extensions project, with
modification to process luma only.
This commit is largely inspired by code from #2266, with
`gl_transform_trans()` authored by @haasn taken directly.
next_vsync/prev_vsync was only used to retrieve the vsync duration. We
can get this in a simpler way.
This also removes the vsync duration estimation from vo_opengl_cb.c,
which is probably worthless anyway. (And once interpolation is made
display-sync only, this won't matter at all.)
Quoting MSDN: "Notifies the Desktop Window Manager (DWM) to opt in to or
out of Multimedia Class Schedule Service (MMCSS) scheduling while the
calling process is alive.". Whatever this means. (An application can
change the scheduling priority of the window manager?)
Does this improve anything? I have no idea. Certainly this is a program
that does multimedia and graphics, so we seem to be a good match for
this.
Is it bad if we enable this even while playback is inactive or paused? I
have no idea either.
Is there a magic cargo cult function that will mark our renderer thread
as multimedia thing? I have no idea. (We use a function to enable MMCSS
for our audio thread in ao_wasapi.)
Enable it by default, but not unconditionally. Add an "auto" mode, which
disable DwmFlush if the compositor is (probably) inactive. Let's see how
this goes.
Since I accidentally enabled DwmFlush always by default (more or less)
in a previous commit touching this code, this is probably mostly just
cargo-culting, and it's uncertain whether it does anything.
Note that I still got bad vsync behavior when fullscreening mpv, and
making another window visible on the same screen. This happens even if
forcing DWM.
Yet another relatively useless option that tries to make OpenGL's sync
behavior somewhat sane. The results are not too encouraging. With a
value of 1, vsync jitter is gone on nVidia, but there are frame drops
(less than with glfinish). With 2, I get the usual vsync jitter _and_
frame drops.
There's still some hope that it might prevent too deep queuing with some
GPUs, I guess.
The timeout for the wait call is 1 second. The value is pretty
arbitrary; it should just not be too high to freeze the process (if
the GPU is un-nice), and not too low to trigger the timeout in normal
cases, even if the GPU load is very high. So I guess 1 second is ok
as a timeout.
The idea to use fences this way to control the queue depth was stolen
from RetroArch:
df01279cf3/gfx/drivers/gl.c (L1856)
vo_frame.num_vsyncs can be != 1 in some cases in normal sync mode too.
This is not a very exact fix, but in exchange it's robust. (These
vo_frame flags are way too tricky in combination with redrawing and
such.)
There were occasional shader compilation and rendering failures if FBOs
were unavailable. This is caused by the FBO caching code getting active,
even though FBOs are unavailable (i.e. dumb-mode).
Boken by commit 97fc4f.
Fixes#2432.
This speeds up redraws considerably (improving eg. <60 Hz material on a 60 Hz
monitor with display-sync active, or redraws while paused), but slightly
slows down the worst case (eg. video FPS = display FPS).
Older systems have certain EGL extension definitions missing. We
redefine them to make the build system easier, and because it's trivial.
But we forgot to define the EGL_LINUX_DMA_BUF_EXT identifier. (I hope
it's the only missing one.)
It's great that the new algorithm supports multiple placebo iterations
and all, but it's really not necessary and hurts performance in the
general case for the sake of the 0.1% that actually pause the screen
and look for minute differences.
Signed-off-by: wm4 <wm4@nowhere>
Adds support for AV_PIX_FMT_GBRP9, AV_PIX_FMT_GBRP10, AV_PIX_FMT_GBRP12,
AV_PIX_FMT_GBRP14, AV_PIX_FMT_GBRP16, AV_PIX_FMT_GBRAP, and
AV_PIX_FMT_GBRAP16.
(Not that it matters, because nobody uses these anyway.)
Newer nVidia drivers support EGL, but they seem to work badly,
apparently don't support some needed features or not in a form we want
(such as swap control), and vdpau interop is not available. Disable it
by default, because I'm tired of explaining this issue.
Can be reverted as soon as nVidia release working drivers.
This parameter has been unused for years (the last flag was removed in
commit d658b115). Get rid of it.
This affects the general VO API, as well as the vo_opengl backend API,
so it touches a lot of files.
The VOFLAGs are still used to control OpenGL context creation, so move
them to the OpenGL backend code.
This gets rid of an old hack, VOFLAG_HIDDEN. Although handling of it has
been sane for a while, it used to cause much pain, and is still
unintuitive and weird even today.
The main reason for this hack is that OpenGL selects a X11 Visual for
you, and you're supposed to use this Visual when creating the X window
for the OpenGL context. Which means the X window can't be created early
in the common X11 init code, but the OpenGL code needs to do something
before that. API-wise you need separate functions for X11 init and X11
window creation. The VOFLAG_HIDDEN hack conflated window creation and
the entrypoint for resizing on video resolution change into one
function, vo_x11_config_vo_window(). This required all platform backends
to handle this flag, even if they didn't need this mechanism.
Wayland still uses this for minor reasons (alpha support?), so the
wayland backend must be changed before the flag can be entirely removed.
If interpolation is enabled, then this causes heavy artifacts if done
while unpaused. It's preferable to allow a latency of a few frames for
the change to take full effect instead. If this is done paused, the
frame is fully redrawn anyway.
It doesn't deal with VDA at all anymore. Rename it to hwdec_osx.c. Not
using hwdec_videotoolbox.c, because that would give it the longest
source path in this project yet. (Also, this code isn't even
VideoToolox-specific, other than the name of the pixel format used.)
VideoToolbox is preferred. Now that FFmpeg released 2.8, there's no
reason to support VDA anymore. In fact, we had a bug that made VDA not
useable with older FFmpeg versions in some newer mpv releases.
VideoToolbox is supported even on slightly older OSX versions, and if
not, you still can run mpv without hw decoding.
There are at least 2 ways of using VAAPI without X11 (Wayland, DRM).
Remove the X11 requirement from the decoder part and the EGL interop.
This will be used by a following commit, which adds Wayland support.
The worst about this is the decoder part, which includes a bad hack for
using the decoder without any VO interop (also known as "vaapi-copy"
mode). Separate the X11 parts so that they're self-contained. For the
EGL interop code we do something similar (it's kept slightly simpler,
because it essentially only has to translate between our silly
MPGetNativeDisplay abstraction and the vaGetDisplay...() call).
It looks like my hope that we can unconditionally include EGL headers in
the OpenGL code is not coming true, because OSX does not support EGL at
all. So I prefer loading the VAAPI EGL/GL specific extensions manually,
because it's less of a mess. Partially reverts commit d47dff3f.
While EGL 1.4 seemed a bit ambiguous about this to me, it actually says
quite clearly that core functions are not supported with
eglGetProcAddress() in the following paragraph.
Normally, we prefer GLX on X11. But for the VAAPI EGL interop, we
obviously want EGL. Since nvidia does not provide EGL with desktop GL
yet, we can leave it to the autoprobing. Just make sure some failure
messages don't unnecessarily show up in the nvidia case.
This breaks VAAPI GLX interop by default, but I don't care much. If
you use --hwdec=auto (which you should if you want hw decoding), this
should fallback to vaapi-copy instead.
Probe the surface format, and check whether it's really something we
support. This also does a complete check whether the EGL interop works
at all (the only way to find this out is actually running this code).
Also, support YV12. Under some circumstances, vaapi (with Intel
drivers) can be made to use this format.
Unfortunately, the Intel drivers show some very weird behavior, which
is hopefully a bug. insane_hack() provides a very evil workaround (see
comments). A proper solution might be passing the hw format as part of
mp_image_params, but as long as hw surfaces appear to be able to change
the format on the fly, attempting this is probably not worth the extra
complexity and likely fragility. The hack allows us to pretend that
there is sane behavior for now.
Broken by commit d47dff3f. If something is going to include EGL.h,
header_fixes.h has to know. This definitely affected vo_rpi, and
probably affects wayland builds (with x11egl didabled) as well.
Checking and resetting the VAImage.buf field is non-sense, even if it
happened to work out in the normal case. buf is actually freed when
vaDestroyImage() is called (not quite intuitive), and we need an extra
field to know whether vaReleaseBufferHandle() has to be called.
Add the upstream symbolic names as comments. Normally, these should be
defined in libdrm's drm_fourcc.h header. But DRM_FORMAT_R8 and
DRM_FORMAT_GR88 are not defined anywhere, except in the kernel userland
headers of Linux 4.3 (!). We don't want mpv to depend on bleeding-edge
Linux kernel headers, so this will have to do.
Also, just for completeness, add fourccs for the 3 and 4 channel
formats. I didn't manage to test them, though.
Reduces the amount of hardcoded assumptions about the layout
drastically. (Now adding yuv420 support would be just adjusting an if,
if you ignore the other problems, such as determining the hw format at
all early enough.)
Don't call eglDestroyImageKHR() on the same ID possibly more than once.
Clear the image reference on termination, or we would leak up to 1 image
per VO recreation.
Should work much better than the old GLX interop code. Requires Mesa 11,
and explicitly selecting the X11 EGL backend with:
--vo=opengl:backend=x11egl
Should it turn out that the new interop works well, we will try to
autodetect EGL by default.
This code still uses some bad assumptions, like expecting surfaces to be
in NV12. (This is probably ok, because virtually all HW will use this
format. But we should at least check this on init or so, instead of
failing to render an image if our assumption doesn't hold up.)
This repo was a lot of help: https://github.com/gbeauchesne/ffvademo
The kodi code was also helpful (the magic FourCC it uses for
EGL_LINUX_DRM_FOURCC_EXT are nowhere documented, and
EGL_IMAGE_INTERNAL_FORMAT_EXT as used in ffvademo does
not actually exist).
(This is the 3rd VAAPI GL interop that was implemented in this player.)
The VAAPI EGL interop code will need access to the X11 Display. While
GLX could return it from the current GLX context, EGL has no such
mechanism. (At least no standard one supported by all implementations.)
So mpv makes up such a mechanism.
For internal purposes, this is very rather awkward solution, but it's
needed for libmpv anyway.
These extensions use a bunch of EGL types, so we need to include the EGL
headers in common.h to use our GL function loader with this.
In the future, we should probably require presence of the EGL headers to
reduce the hacks. This might be not so simple at least with OSX, so for
now this has to do.
Surfaces used by hardware decoding formats can be mapped exactly like a
specific software pixel format, e.g. RGBA or NV12. p->image_params is
supposed to be set to this format, but it wasn't.
(How did this ever work?)
Also, setting params->imgfmt in the hwdec interop drivers is pointless
and redundant. (Change them to asserts, because why not.)
This is a pseudo-OpenGL extension for letting libmpv query native
windowing system handles from the API user. (It uses the OpenGL
extension mechanism because I'm lazy. In theory it would be nicer to let
the user pass them with mpv_opengl_cb_init_gl(), but this would require
a more intrusive API change to extend its argument list.)
The naming of the extension and associated function was unnecessarily
Windows specific (using "D3D"), even though it would work just fine for
other platforms. So deprecate the old names and introduce new ones. The
old ones still work.
This turns the old scalers (inherited from MPlayer) into a pre-
processing step (after color conversion and before scaling). The code
for the "sharpen5" scaler is reused for this.
The main reason MPlayer implemented this as scalers was perhaps because
FBOs were too expensive, and making it a scaler allowed to implement
this in 1 pass. But unsharp masking is not really a scaler, and I would
guess the result is more like combining bilinear scaling and unsharp
masking.
2 things are being stupid here: Apple for requiring rectangle textures
with their IOSurface interop for no reason, and OpenGL having a
different sampler type for rectangle textures.
The removal of source-shader is a side effect, since this effectively
replaces it - and the video-reading code has been significantly
restructured to make more sense and be more readable.
This means users no longer have to constantly download and maintain a
separate deband.glsl installation alongside mpv, which was the only real
use case for source-shader that we found either way.
This is mostly to cut down somewhat on the amount of code bloat in
video.c by moving out helper functions (including scaler kernels and
color management routines) to a separate file.
It would certainly be possible to move out more functions (eg. dithering
or CMS code) with some extra effort/refactoring, but this is a start.
Signed-off-by: wm4 <wm4@nowhere>