Glitches when resizing are still possible, but are reduced. Other VOs
could support this too, but don't need to do so.
(Totally avoiding glitches would be much more effort, and probably not
worth the trouble. How about you just watch the video the player is
playing, instead of spending your time resizing the window.)
Until now, we have tried to create a GL 3.0 context. The main reason for
this is that many Mesa-based drivers did not support anything better.
But some drivers (Mesa AMD) will not report a higher OpenGL version,
because their compatibility mode is restricted. While later GL features
are reported as extensions just fine, there doesn't seem to be a way to
determine or enable higher GLSL versions.
Add some more shitty hacks to try to deal with this messed up situation,
and try to probe each interesting GL version separately (starting with
3.3, then 3.2 etc.). Other backends might suffer from similar problems,
but these will have to deal with it on their own.
Probably fixes#2938, or maybe not.
converted_imgfmt will be used by the renderer logic to build an
appropriate shader chain. It doesn't influence the format of any
textures. Thus it doesn't matter whether the hw video surface is mapped
as RGB or RGBA. What matters is if the video actually contains alpha or
not. Since virtually all hardware decoder do not support alpha in any
way, this can be hardcoded as "no alpha".
This avoids unnecessary GPU work.
This also gets rid of the kind of hard to read texture swizzle setup and
turns it into something dumber.
Assumes that we don't create any FBOs with 2 channel formats. (Only the
video source textures are handled by this commit.)
Previously, gl->DXOpenDeviceNV was called twice using dxva2 with dxinterop. AMD
drivers refused to allow this. With this commit, context_dxinterop sets its own
implementation of MPGetNativeDisplay, which can return either a
IDirect3DDevice9Ex or a dxinterop_device_HANDLE depending on the "name" request
string. hwdec_dxva2gldx then requests both of these avoiding the need to call
gl->DXOpenDeviceNV a second time.
Like dxinterop, this uses StretchRect or RGB conversion. This is unavoidable as
long as we use the dxva2 API, as there is no way to access the raw hardware
decoded Direct3D9 surfaces.
The default of 1.0 was basically making half the algorithm do nothing,
since it turned off all diagonal contributions. The upstream default is
0.6, and this produces a more reasonable image.
The values were changed to reflect an upstream change in the source for
the super-xBR implementation.
The anti-ringing code was basically not working at all, the new
algorithm _significantly_ improves the result (reduces ringing).
This is a fresh implementation from scratch that carries with it
significantly less baggage and verbosity from the previous (ported)
version.
The actual values for the masks and such were copied from the
current code. Behavior and performance should be unaffected.
An important difference between the old code and the new code is that
the new code always explicitly samples from the first component, rather
than being able to process multiple planes at once.
Since prescale-luma only affects luma, I deemed this unnecessary. May
change in the future, if prescale-chroma ever gets implemented. But
prescaling multiple planes would be slow to do this way. (Better would
be to generalize it to differently-sized vectors)
Instead of hard-coding the logic and planes to skip, factor this out
to a reusible function, and instead add the number of relevant
coordinates to the texture state.
Since prescale now literally only affects the luma plane (and the
filters are all designed for luma-only operation either way), the option
has been renamed and the documentation updated to clarify this.
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
Why was this done so stupidly, with so many complicated special cases,
before? Declare it once so the shader bits don't have to figure out where
and when to do so themselves.
The WGL_NV_DX_interop spec says that a shared IDirect3DSurface9 must not
be lockable, but off-screen plain surfaces are always lockable and using
them causes Nvidia drivers to crash. Use a rendertarget for the shared
surface instead.
This also changes the name of the DX_interop handle for the rendertarget
to match the name of the DirectX object (rather than the GL one) to
match the convention used in context_dxinterop.c.
Apple crap (namely hardware decoding interop) forces us to use rectangle
textures for input. But after that we continue with normal textures.
This was not considered for debanding, and the sampler type used for it
can be different depending on the exact render chain. Simply use the
target type of the input texture.
* use mp_HRESULT_to_str/mp_LastError_to_str
* make some messages non-identical
* replace "GL" -> "OpenGL"
* change some MP_FATAL to MP_ERR that don't actually kill the vo
It thinks that integer_conv_fbo[index] is implied to be accessed with up
to index=5. Although that is theoretical only, it has a point that this
makes no sense. Use the same constant for the array allocation, to make
it more uniform and robust.
Fixes CID 1350060.
Since there can be multiple backends for a single API (vaapi can use GLX
or EGL), not logging the exact backend name is annoying. So add it. At
the same time, there is no need to duplicate the name as used by the
--hwdec options, so replace it with using the numeric hwdec API ID.
GLES requires this. Some more common sampler types have default
precisions, but not usampler2D. Newer ANGLE builds verify this more
strictly than older builds, so this wasn't caught before.
Fixes#2761.
GLES does not support high bit depth fixed point textures for unknown
reasons, so direct 10 bit input is not possible. But we can still use
integer textures, which are supported by GLES 3.0. These store integer
data just like the standard fixed point textures, except they are not
normalized on sampling. They also don't support bilinear filtering, and
require a special sampler ("usampler2D").
While these texture formats enable us to shuffle the data to the GPU,
they're rather impractical with the requirements mentioned above and our
current architecture. One problem is that most code assumes it can
always use bilinear scaling (even if bilinear is never used when using
appropriate scale/cscale options). Another is that we don't have any
concept of running a function on a texture in an uniform way.
So for now, run a simple conversion step through a FBO. The FBO will use
the rgba16f format normally, which gives enough bits for 10 bit, and
will at least gracefully degrade with higher depth input.
This is bound to be much slower than a more "direct" method, but at
least it works and is simple to implement.
The odd change of function call order in init_video() is to properly
disable "dumb mode" (no FBO use) if these texture formats are in use.
This was never reset - absolutely can't be right. If the renderer
somehow switches back to another codepath, it certainly has to be reset.
Maybe this was hard to hit, as the normalization is going to be
idempotent in simpler cases (like rendering RGBA input).
Also get rid of the "merged" variable.
Often requested. The main argument, that prominent scalers like sharpen
change the image even if no scaling happens, disappeared anyway.
("sharpen", unsharp masking, is neither prominent nor a scaler anymore.
This is an artifact from MPlayer, which fuses unsharp masking with
bilinear scaling in order to make it single-pass, or so.)
Some VOs had support for these - remove them.
Typically, these formats will have only some use in cases where using
RGB software conversion with libswscale is faster than letting the
VO/GPU do it (i.e. almost never). For the sake of testing this case,
keep IMGFMT_RGB565. This is the least messy format, because it has no
padding/alpha bits with unknown semantics.
Note that decoding to these formats still works. We'll let libswscale
repack the data to whatever the VO in use can take.
Do this to make the license situation less confusing.
This change should be of no consequence, since LGPL is compatible with
GPL anyway, and making it LGPL-only does not restrict the use with GPL
code.
Additionally, the wording implies that this is allowed, and that we can
just remove the GPL part.
This covers source files which were added in mplayer2 and mpv times
only, and where all code is covered by LGPL relicensing agreements.
There are probably more files to which this applies, but I'm being
conservative here.
A file named ao_sdl.c exists in MPlayer too, but the mpv one is a
complete rewrite, and was added some time after the original ao_sdl.c
was removed. The same applies to vo_sdl.c, for which the SDL2 API is
radically different in addition (MPlayer supports SDL 1.2 only).
common.c contains only code written by me. But common.h is a strange
case: although it originally was named mp_common.h and exists in MPlayer
too, by now it contains only definitions written by uau and me. The
exceptions are the CONTROL_ defines - thus not changing the license of
common.h yet.
codec_tags.c contained once large tables generated from MPlayer's
codecs.conf, but all of these tables were removed.
From demux_playlist.c I'm removing a code fragment from someone who was
not asked; this probably could be done later (see commit 15dccc37).
misc.c is a bit complicated to reason about (it was split off mplayer.c
and thus contains random functions out of this file), but actually all
functions have been added post-MPlayer. Except get_relative_time(),
which was written by uau, but looks similar to 3 different versions of
something similar in each of the Unix/win32/OSX timer source files. I'm
not sure what that means in regards to copyright, so I've just moved it
into another still-GPL source file for now.
screenshot.c once had some minor parts of MPlayer's vf_screenshot.c, but
they're all gone.
Should take care of the planned FFmpeg AV_PIX_FMT_P010 addition. (This
will eventually be needed when doing HEVC Main 10 decoding with DXVA2
copyback.)
This file claims to be based on the "MPlayer VA-API patch", but this is
untrue. Only some glue code was copied from hwdec_vaglx.c, and this glue
code was never in MPlayer or the MPlayer VA-API patch in any form, and
instead part of the mpv-original way we do hardware decoding OpenGL
interop. The EGL interop method didn't exist at the time the MPlayer
VA-API patch was created either.
GLSL in GLES 2.0 did not have line continuation in its preprocessor.
This broke shader compilation. It also broke subtitle rendering in
vo_rpi, which reuses some of the OpenGL code.
Line continuation was finally added in GLES 3.0, which is perhaps the
reason why ANGLE accepted it.
Untested, but should be fine. Broken by commit 0a0bb905.
Also fix the include statement in context_rpi.c, which caused another
compilation failure. Also untested. (Because I'm lazy.)
Fixes#2638.
gcc 4.8 does not support C11 thread local storage. This is a bit
annoying, so add a hack to use the gcc specific __thread extension if
C11 TLS is not available.
(This is used for the extremely silly mpv-internal way hwdec modules
access some platform specific handles. Disabling it simply made
hwdec_vaegl.c always fail initialization.)
Fixes#2631.
Add a "blend-tiles" choice to the "alpha" sub-option. This is pretty
simplistic and uses the GL raster position to derive the tiles. A weird
consequence is that using --vo=opengl and --vo=opengl-hq gives different
scaling behavior (screenspace pixel size vs. source video pixel size
16x16 tiles), but it seems we don't have easy access to the original
texture coordinates. Using the rasterpos is probably simpler.
Make this option the default.
long is 64 bits on x86_64 on Linux, which means the check for the corner
case of computing the depth mask is wrong.
Also, X11 compositors seem to expect premultiplied alpha.
Since alpha isn't pulled through the colormatrix (maybe it should?), we
reject alpha formats with odd sizes, such as yuva444p10.
But the awful tex_mul path in vo_opengl does this anyway (at some points
even explicitly), which means there will be a subtle difference in
handling of 16 bit yuv alpha formats. Make it consistent and always
apply the range adjustment to the alpha component. This also means odd
sizes like 10 bit are supported now.
This assumes alpha uses the same "shifted" range as the yuv color
channels for depths larger than 8 bit. I'm not sure whether this is
actually the case.
Now common.c only contains the code for the function loader, while
context.c contains the backend loader/dispatcher.
Not calling it "backend.c", because the central struct is called
MPGLContext.
This is used for dithering, although I'm not aware of anyone who got
higher than 8 bit depth support to work on Linux.
Also put this into egl_helpers.c. Since EGL is pseudo-portable at best I
have no hope that the EGL context creation code in all the backends can
be fully shared. But some self-contained functionality can definitely be
shared.
Store the determined framebuffer depth in struct GL instead of
MPGLContext. This means gl_video_set_output_depth() can be removed, and
also justifies adding new fields describing framebuffer/backend
properties to struct GL instead of having to add more functions just to
shovel the information around.
Keep in mind that mpgl_load_functions() will wipe struct GL, so the
new fields must be set before calling it.
Although the source file is named w32.c, the backend name was "win"
until recently. It was accidentally changed to "w32"; fix it.
Fixes#2608 (the manual is correct).
When a Direct3D 9Ex device fails to reset, it gets put into the lost
state, so set the lost_device flag and don't attempt to present until
the device moves out of that state. Failure to recreate the size-
dependent objects should set lost_device as well, since we shouldn't try
to present in that state.
Also, it looks like I was too eager to remove code that sets priv
members to NULL and I accidentally removed some that was needed.
Direct3D doesn't like 0-sized swapchain dimensions, even when those
dimensions are automatically set. Manually set them to a size that isn't
zero instead.
Why not.
Also, instead of disabling hue/saturation for RGB, just don't apply
them. (They don't make sense for conversion matrixes other than YUV, but
I can't be bothered to keep the fine-grained disabling of UI controls
either.)
WGL_NV_DX_interop is widely supported by Nvidia and AMD drivers. It
allows a texture to be shared between Direct3D and WGL, so that
rendering can be done with WGL and presentation can be done with
Direct3D. This should allow us to work around some persistent WGL
issues, such as dropped frames with some driver/OS combos, drivers that
buffer frames to increase performance at the cost of latency, and the
inability to disable exclusive fullscreen mode when using WGL to render
to a fullscreen window.
The addition of a DX_interop backend might also enable some cool
Direct3D-specific enhancements in the future, such as using the
GetPresentStatistics API to get accurate frame presentation timestamps.
Note that due to a driver bug, this backend is currently broken on
Intel. It will appear to work as long as the window is not resized too
often, but after a few changes of size it will be unable to share the
newly created renderbuffer with GL. See:
https://software.intel.com/en-us/forums/graphics-driver-bug-reporting/topic/562051
With default setting, the matrix for fruit dithering requires 12 bits
precision (values from 0/4096 to 4095/4096). But 16-bit float
provides only 10 bits. In addition, when `dither-size-fruit=8` is
set, 16 bits are required from the texture format.
Fix this by attempting to use 16 bit integer texture first. This is
still not precise, but should be better than using a half float.
The recent LUT adjustment changes broke interpolation.
The concatenation of the shader stages is a bit messy, and it seems like
sampler_prelude is not a good place to add this macro. Always add the
macro to every shader instead. (While this doesn't seem too elegant,
this isn't too inelegant either, and goes these problems out of the
way.)
The computation of the tex_mul variable was broken in multiple ways.
This variable is used e.g. by debanding for moving expansion of 10 bit
fixed-point input to normalized range to another stage of processing.
One obvious bug was that the rgb555 pixel format was broken. This format
has component_bits=5, but obviously it's already sampled in normalized
range, and does not need expansion. The tex_mul-free code path avoids
this by not using the colormatrix. (The code was originally designed to
work around dealing with the generally complicated pixel formats by only
using the colormatrix in the YUV case.)
Another possible bug was with 10 bit input. It expanded the input by
bringing the [0,2^10) range to [0,1], and then treating the expanded
input as 16 bit input. I didn't bother to check what this actually
computed, but it's somewhat likely it was wrong anyway. Now it uses
mp_get_csp_mul(), and disables expansion when computing the YUV matrix.
It turns out that with accurate lookup we can decrease the
default size of texture now. Do it to compensate the performance
loss introduced by the LUT_POS macro.
Define a macro to correct the coordinate for lookup texture. Cache
the corrected coordinate for 1D filter and use mix() to minimize the
performance impact.
If the sampling point is placed diagonally, the radius difference
could be as large as sqrt(2.0). And a loosened check with (radius - 1)
would potentially include pixels out of the range.
Fix the check to handle those corner case properly to avoid
unnecessary texture lookup and improve the performance a bit.
There are claims that nnedi3.c doesn't constitute its own new
implementation, but is derived from existing HLSL or OpenCL shaders
distributed under the LGPLv3 license.
Until these are resolved, do the "correct" thing and require
--enable-gpl3 to build nnedi.
At least I hope so.
Deriving the duration from the pts was not really correct. It doesn't
include speed adjustments, and becomes completely wrong of the user e.g.
changes the playback speed by a huge amount. Pass through the accurate
duration value by adding a new vo_frame field.
The value for vsync_offset was not correct either. We don't need the
error for the next frame, but the error for the current one. This wasn't
noticed because it makes no difference in symmetric cases, like 24 fps
on 60 Hz.
I'm still not entirely confident in the correctness of this, but it sure
is an improvement.
Also, remove the MP_STATS() calls - they're not really useful to debug
anything anymore.
This was just converting back and forth between int64_t/microseconds and
double/seconds. Remove this stupidity. The pts/duration fields are still
in microseconds, but they have no meaning in the display-sync case (also
drop printing the pts field from opengl/video.c - it's always 0).