2013-03-01 20:19:20 +00:00
|
|
|
/*
|
|
|
|
* This file is part of mpv.
|
|
|
|
*
|
2016-01-19 17:36:34 +00:00
|
|
|
* mpv is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
|
|
* License as published by the Free Software Foundation; either
|
|
|
|
* version 2.1 of the License, or (at your option) any later version.
|
2013-03-01 20:19:20 +00:00
|
|
|
*
|
|
|
|
* mpv is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
2016-01-19 17:36:34 +00:00
|
|
|
* GNU Lesser General Public License for more details.
|
2013-03-01 20:19:20 +00:00
|
|
|
*
|
2016-01-19 17:36:34 +00:00
|
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
|
|
* License along with mpv. If not, see <http://www.gnu.org/licenses/>.
|
2013-03-01 20:19:20 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <assert.h>
|
2013-03-30 03:01:17 +00:00
|
|
|
#include <math.h>
|
2016-04-16 16:14:32 +00:00
|
|
|
#include <stdarg.h>
|
2013-03-30 03:01:17 +00:00
|
|
|
#include <stdbool.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <assert.h>
|
|
|
|
|
|
|
|
#include <libavutil/common.h>
|
2015-03-27 12:27:40 +00:00
|
|
|
#include <libavutil/lfg.h>
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-08-29 02:12:56 +00:00
|
|
|
#include "video.h"
|
2013-03-30 03:01:17 +00:00
|
|
|
|
2014-08-29 10:09:04 +00:00
|
|
|
#include "misc/bstr.h"
|
2015-09-09 18:40:04 +00:00
|
|
|
#include "options/m_config.h"
|
2015-11-28 18:59:11 +00:00
|
|
|
#include "common/global.h"
|
|
|
|
#include "options/options.h"
|
2015-08-29 02:12:56 +00:00
|
|
|
#include "common.h"
|
2016-05-12 18:08:49 +00:00
|
|
|
#include "formats.h"
|
2015-08-29 02:12:56 +00:00
|
|
|
#include "utils.h"
|
|
|
|
#include "hwdec.h"
|
|
|
|
#include "osd.h"
|
2015-09-23 20:13:03 +00:00
|
|
|
#include "stream/stream.h"
|
2015-10-26 22:43:48 +00:00
|
|
|
#include "superxbr.h"
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
#include "nnedi3.h"
|
2015-09-05 12:03:00 +00:00
|
|
|
#include "video_shaders.h"
|
2016-04-20 23:33:13 +00:00
|
|
|
#include "user_shaders.h"
|
2015-08-28 23:10:30 +00:00
|
|
|
#include "video/out/filter_kernels.h"
|
|
|
|
#include "video/out/aspect.h"
|
|
|
|
#include "video/out/bitmap_packer.h"
|
|
|
|
#include "video/out/dither.h"
|
|
|
|
#include "video/out/vo.h"
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
// Maximal number of passes that prescaler can be applied.
|
|
|
|
#define MAX_PRESCALE_PASSES 5
|
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// Maximal number of saved textures (for user script purposes)
|
|
|
|
#define MAX_TEXTURE_HOOKS 16
|
2016-04-19 18:45:40 +00:00
|
|
|
#define MAX_SAVED_TEXTURES 32
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2015-01-20 20:46:19 +00:00
|
|
|
// scale/cscale arguments that map directly to shader filter routines.
|
2013-03-01 20:19:20 +00:00
|
|
|
// Note that the convolution filters are not included in this list.
|
2014-06-10 21:56:05 +00:00
|
|
|
static const char *const fixed_scale_filters[] = {
|
2013-03-01 20:19:20 +00:00
|
|
|
"bilinear",
|
|
|
|
"bicubic_fast",
|
2015-03-15 05:27:11 +00:00
|
|
|
"oversample",
|
2015-03-27 12:27:40 +00:00
|
|
|
"custom",
|
2013-03-01 20:19:20 +00:00
|
|
|
NULL
|
|
|
|
};
|
2015-03-15 06:11:51 +00:00
|
|
|
static const char *const fixed_tscale_filters[] = {
|
2015-07-11 11:55:45 +00:00
|
|
|
"oversample",
|
2015-03-15 06:11:51 +00:00
|
|
|
NULL
|
|
|
|
};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
// must be sorted, and terminated with 0
|
2014-12-08 16:08:26 +00:00
|
|
|
int filter_sizes[] =
|
|
|
|
{2, 4, 6, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 0};
|
2015-03-13 18:30:31 +00:00
|
|
|
int tscale_sizes[] = {2, 4, 6, 0}; // limited by TEXUNIT_VIDEO_NUM
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct vertex_pt {
|
|
|
|
float x, y;
|
|
|
|
};
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct vertex {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct vertex_pt position;
|
2015-03-13 12:42:05 +00:00
|
|
|
struct vertex_pt texcoord[TEXUNIT_VIDEO_NUM];
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
static const struct gl_vao_entry vertex_vao[] = {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{"position", 2, GL_FLOAT, false, offsetof(struct vertex, position)},
|
|
|
|
{"texcoord0", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[0])},
|
|
|
|
{"texcoord1", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[1])},
|
|
|
|
{"texcoord2", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[2])},
|
|
|
|
{"texcoord3", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[3])},
|
2015-03-13 12:42:05 +00:00
|
|
|
{"texcoord4", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[4])},
|
|
|
|
{"texcoord5", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[5])},
|
2015-01-28 21:22:29 +00:00
|
|
|
{0}
|
|
|
|
};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
struct texplane {
|
2013-03-28 19:40:19 +00:00
|
|
|
int w, h;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
int tex_w, tex_h;
|
2013-03-28 19:48:53 +00:00
|
|
|
GLint gl_internal_format;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLenum gl_target;
|
2016-01-26 19:47:32 +00:00
|
|
|
bool use_integer;
|
2013-03-28 19:48:53 +00:00
|
|
|
GLenum gl_format;
|
|
|
|
GLenum gl_type;
|
2013-03-01 20:19:20 +00:00
|
|
|
GLuint gl_texture;
|
|
|
|
int gl_buffer;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
char swizzle[5];
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image {
|
2013-03-28 20:02:53 +00:00
|
|
|
struct texplane planes[4];
|
2013-03-28 19:40:19 +00:00
|
|
|
bool image_flipped;
|
2015-01-22 17:29:37 +00:00
|
|
|
struct mp_image *mpi; // original input image
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
bool hwdec_mapped;
|
2013-03-28 19:40:19 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
enum plane_type {
|
|
|
|
PLANE_NONE = 0,
|
|
|
|
PLANE_RGB,
|
|
|
|
PLANE_LUMA,
|
|
|
|
PLANE_CHROMA,
|
|
|
|
PLANE_ALPHA,
|
|
|
|
PLANE_XYZ,
|
2014-11-23 19:06:05 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// A self-contained description of a source image which can be bound to a
|
|
|
|
// texture unit and sampled from. Contains metadata about how it's to be used
|
|
|
|
struct img_tex {
|
|
|
|
enum plane_type type; // must be set to something non-zero
|
|
|
|
int components; // number of relevant coordinates
|
|
|
|
float multiplier; // multiplier to be used when sampling
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLuint gl_tex;
|
|
|
|
GLenum gl_target;
|
2016-01-26 19:47:32 +00:00
|
|
|
bool use_integer;
|
2016-04-08 20:21:31 +00:00
|
|
|
int tex_w, tex_h; // source texture size
|
2016-04-16 16:14:32 +00:00
|
|
|
int w, h; // logical size (after transformation)
|
2016-04-08 20:21:31 +00:00
|
|
|
struct gl_transform transform; // rendering transformation
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
char swizzle[5];
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
};
|
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// A named img_tex, for user scripting purposes
|
|
|
|
struct saved_tex {
|
|
|
|
const char *name;
|
|
|
|
struct img_tex tex;
|
|
|
|
};
|
|
|
|
|
|
|
|
// A texture hook. This is some operation that transforms a named texture as
|
|
|
|
// soon as it's generated
|
|
|
|
struct tex_hook {
|
|
|
|
const char *hook_tex;
|
|
|
|
const char *save_tex;
|
|
|
|
const char *bind_tex[TEXUNIT_VIDEO_NUM];
|
|
|
|
int components; // how many components are relevant (0 = same as input)
|
|
|
|
void *priv; // this can be set to whatever the hook wants
|
|
|
|
void (*hook)(struct gl_video *p, struct img_tex tex, // generates GLSL
|
|
|
|
struct gl_transform *trans, void *priv);
|
2016-04-20 23:33:13 +00:00
|
|
|
void (*free)(struct tex_hook *hook);
|
2016-04-16 16:14:32 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct fbosurface {
|
|
|
|
struct fbotex fbotex;
|
|
|
|
double pts;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
#define FBOSURFACES_MAX 10
|
|
|
|
|
2015-09-23 20:13:03 +00:00
|
|
|
struct cached_file {
|
|
|
|
char *path;
|
2016-04-20 23:33:13 +00:00
|
|
|
struct bstr body;
|
2015-09-23 20:13:03 +00:00
|
|
|
};
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct gl_video {
|
|
|
|
GL *gl;
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
struct mpv_global *global;
|
2013-07-31 19:44:21 +00:00
|
|
|
struct mp_log *log;
|
2013-03-01 20:19:20 +00:00
|
|
|
struct gl_video_opts opts;
|
2016-02-13 14:33:00 +00:00
|
|
|
struct gl_lcms *cms;
|
2013-03-01 20:19:20 +00:00
|
|
|
bool gl_debug;
|
|
|
|
|
2014-12-24 15:54:47 +00:00
|
|
|
int texture_16bit_depth; // actual bits available in 16 bit textures
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct gl_shader_cache *sc;
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
struct gl_vao vao;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2014-06-15 18:46:57 +00:00
|
|
|
struct osd_state *osd_state;
|
2013-03-01 20:19:20 +00:00
|
|
|
struct mpgl_osd *osd;
|
2014-06-15 18:46:57 +00:00
|
|
|
double osd_pts;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
GLuint lut_3d_texture;
|
|
|
|
bool use_lut_3d;
|
|
|
|
|
|
|
|
GLuint dither_texture;
|
|
|
|
int dither_size;
|
|
|
|
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
GLuint nnedi3_weights_buffer;
|
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
struct mp_image_params real_image_params; // configured format
|
|
|
|
struct mp_image_params image_params; // texture format (mind hwdec case)
|
2014-10-16 21:51:36 +00:00
|
|
|
struct mp_imgfmt_desc image_desc;
|
2015-01-29 14:50:21 +00:00
|
|
|
int plane_count;
|
2013-03-28 19:40:19 +00:00
|
|
|
|
2015-12-07 22:45:41 +00:00
|
|
|
bool is_yuv, is_packed_yuv;
|
2013-07-18 11:52:38 +00:00
|
|
|
bool has_alpha;
|
|
|
|
char color_swizzle[5];
|
2016-01-26 19:47:32 +00:00
|
|
|
bool use_integer_conversion;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image image;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
bool dumb_mode;
|
2016-01-26 19:47:32 +00:00
|
|
|
bool forced_dumb_mode;
|
2015-11-19 20:22:24 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct fbotex merge_fbo[4];
|
|
|
|
struct fbotex scale_fbo[4];
|
|
|
|
struct fbotex integer_fbo[4];
|
2015-03-27 12:27:40 +00:00
|
|
|
struct fbotex indirect_fbo;
|
2015-03-23 01:42:19 +00:00
|
|
|
struct fbotex blend_subs_fbo;
|
2015-09-05 10:02:02 +00:00
|
|
|
struct fbotex output_fbo;
|
2014-11-23 19:06:05 +00:00
|
|
|
struct fbosurface surfaces[FBOSURFACES_MAX];
|
2016-04-16 16:14:32 +00:00
|
|
|
struct fbotex prescale_fbo[MAX_PRESCALE_PASSES];
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
int surface_idx;
|
|
|
|
int surface_now;
|
2015-07-02 11:17:20 +00:00
|
|
|
int frames_drawn;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
bool is_interpolated;
|
2015-09-05 10:02:02 +00:00
|
|
|
bool output_fbo_valid;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
// state for configured scalers
|
|
|
|
struct scaler scaler[SCALER_COUNT];
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
struct mp_csp_equalizer video_eq;
|
|
|
|
|
|
|
|
struct mp_rect src_rect; // displayed part of the source video
|
|
|
|
struct mp_rect dst_rect; // video rectangle on output window
|
|
|
|
struct mp_osd_res osd_rect; // OSD size/margins
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
int vp_w, vp_h;
|
|
|
|
|
|
|
|
// temporary during rendering
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex pass_tex[TEXUNIT_VIDEO_NUM];
|
|
|
|
int pass_tex_num;
|
2015-10-23 17:52:03 +00:00
|
|
|
int texture_w, texture_h;
|
2015-10-26 22:43:48 +00:00
|
|
|
struct gl_transform texture_offset; // texture transform without rotation
|
2016-03-05 11:38:51 +00:00
|
|
|
int components;
|
2015-03-15 21:52:34 +00:00
|
|
|
bool use_linear;
|
2015-03-15 23:09:36 +00:00
|
|
|
float user_gamma;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// hooks and saved textures
|
|
|
|
struct saved_tex saved_tex[MAX_SAVED_TEXTURES];
|
|
|
|
int saved_tex_num;
|
|
|
|
struct tex_hook tex_hooks[MAX_TEXTURE_HOOKS];
|
|
|
|
int tex_hook_num;
|
2016-04-19 18:45:40 +00:00
|
|
|
struct fbotex hook_fbos[MAX_SAVED_TEXTURES];
|
|
|
|
int hook_fbo_num;
|
2016-04-16 16:14:32 +00:00
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
int frames_uploaded;
|
2013-03-01 20:19:20 +00:00
|
|
|
int frames_rendered;
|
2015-03-27 12:27:40 +00:00
|
|
|
AVLFG lfg;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-05-25 23:48:39 +00:00
|
|
|
// Cached because computing it can take relatively long
|
|
|
|
int last_dither_matrix_size;
|
|
|
|
float *last_dither_matrix;
|
|
|
|
|
2015-09-23 20:13:03 +00:00
|
|
|
struct cached_file files[10];
|
|
|
|
int num_files;
|
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
struct gl_hwdec *hwdec;
|
|
|
|
bool hwdec_active;
|
2015-11-28 18:59:11 +00:00
|
|
|
|
|
|
|
bool dsi_warned;
|
2016-01-25 19:24:41 +00:00
|
|
|
bool custom_shader_fn_warned;
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
struct packed_fmt_entry {
|
|
|
|
int fmt;
|
|
|
|
int8_t component_size;
|
|
|
|
int8_t components[4]; // source component - 0 means unmapped
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct packed_fmt_entry mp_packed_formats[] = {
|
2015-01-21 18:29:18 +00:00
|
|
|
// w R G B A
|
2013-07-18 11:52:38 +00:00
|
|
|
{IMGFMT_Y8, 1, {1, 0, 0, 0}},
|
|
|
|
{IMGFMT_Y16, 2, {1, 0, 0, 0}},
|
|
|
|
{IMGFMT_YA8, 1, {1, 0, 0, 2}},
|
2015-01-21 18:29:18 +00:00
|
|
|
{IMGFMT_YA16, 2, {1, 0, 0, 2}},
|
2013-07-18 11:52:38 +00:00
|
|
|
{IMGFMT_ARGB, 1, {2, 3, 4, 1}},
|
|
|
|
{IMGFMT_0RGB, 1, {2, 3, 4, 0}},
|
|
|
|
{IMGFMT_BGRA, 1, {3, 2, 1, 4}},
|
|
|
|
{IMGFMT_BGR0, 1, {3, 2, 1, 0}},
|
|
|
|
{IMGFMT_ABGR, 1, {4, 3, 2, 1}},
|
|
|
|
{IMGFMT_0BGR, 1, {4, 3, 2, 0}},
|
|
|
|
{IMGFMT_RGBA, 1, {1, 2, 3, 4}},
|
|
|
|
{IMGFMT_RGB0, 1, {1, 2, 3, 0}},
|
|
|
|
{IMGFMT_BGR24, 1, {3, 2, 1, 0}},
|
|
|
|
{IMGFMT_RGB24, 1, {1, 2, 3, 0}},
|
|
|
|
{IMGFMT_RGB48, 2, {1, 2, 3, 0}},
|
|
|
|
{IMGFMT_RGBA64, 2, {1, 2, 3, 4}},
|
|
|
|
{IMGFMT_BGRA64, 2, {3, 2, 1, 4}},
|
|
|
|
{0},
|
|
|
|
};
|
2013-03-28 19:48:53 +00:00
|
|
|
|
2014-12-09 20:34:01 +00:00
|
|
|
const struct gl_video_opts gl_video_opts_def = {
|
2013-03-01 20:19:20 +00:00
|
|
|
.dither_depth = -1,
|
2013-05-25 23:48:39 +00:00
|
|
|
.dither_size = 6,
|
2015-07-20 17:09:22 +00:00
|
|
|
.temporal_dither_period = 1,
|
2015-11-19 20:20:50 +00:00
|
|
|
.fbo_format = 0,
|
2015-01-06 09:47:26 +00:00
|
|
|
.sigmoid_center = 0.75,
|
|
|
|
.sigmoid_slope = 6.5,
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
.scaler = {
|
2015-03-27 05:18:32 +00:00
|
|
|
{{"bilinear", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // scale
|
|
|
|
{{NULL, .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // dscale
|
|
|
|
{{"bilinear", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // cscale
|
2015-11-29 12:04:01 +00:00
|
|
|
{{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}},
|
|
|
|
.clamp = 1, }, // tscale
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
},
|
2016-01-25 20:35:39 +00:00
|
|
|
.scaler_resizes_only = 1,
|
2015-12-06 16:22:41 +00:00
|
|
|
.scaler_lut_size = 6,
|
2016-01-27 20:07:17 +00:00
|
|
|
.interpolation_threshold = 0.0001,
|
2015-12-22 22:14:47 +00:00
|
|
|
.alpha_mode = 3,
|
2014-12-09 20:34:01 +00:00
|
|
|
.background = {0, 0, 0, 255},
|
2015-02-03 16:12:04 +00:00
|
|
|
.gamma = 1.0f,
|
2015-10-26 22:43:48 +00:00
|
|
|
.prescale_passes = 1,
|
|
|
|
.prescale_downscaling_threshold = 2.0f,
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2013-10-24 20:20:16 +00:00
|
|
|
const struct gl_video_opts gl_video_opts_hq_def = {
|
|
|
|
.dither_depth = 0,
|
|
|
|
.dither_size = 6,
|
2015-07-20 17:09:22 +00:00
|
|
|
.temporal_dither_period = 1,
|
2015-11-19 20:20:50 +00:00
|
|
|
.fbo_format = 0,
|
2015-11-07 16:49:14 +00:00
|
|
|
.correct_downscaling = 1,
|
2015-01-06 09:47:26 +00:00
|
|
|
.sigmoid_center = 0.75,
|
|
|
|
.sigmoid_slope = 6.5,
|
|
|
|
.sigmoid_upscaling = 1,
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
.scaler = {
|
2015-03-27 05:18:32 +00:00
|
|
|
{{"spline36", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // scale
|
|
|
|
{{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // dscale
|
|
|
|
{{"spline36", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // cscale
|
2015-11-29 12:04:01 +00:00
|
|
|
{{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}},
|
|
|
|
.clamp = 1, }, // tscale
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
},
|
2016-01-25 20:35:39 +00:00
|
|
|
.scaler_resizes_only = 1,
|
2015-12-06 16:22:41 +00:00
|
|
|
.scaler_lut_size = 6,
|
2016-01-27 20:07:17 +00:00
|
|
|
.interpolation_threshold = 0.0001,
|
2015-12-22 22:14:47 +00:00
|
|
|
.alpha_mode = 3,
|
2014-12-09 20:34:01 +00:00
|
|
|
.background = {0, 0, 0, 255},
|
2015-02-03 16:12:04 +00:00
|
|
|
.gamma = 1.0f,
|
2015-03-23 01:42:19 +00:00
|
|
|
.blend_subs = 0,
|
2015-09-05 15:39:27 +00:00
|
|
|
.deband = 1,
|
2015-10-26 22:43:48 +00:00
|
|
|
.prescale_passes = 1,
|
|
|
|
.prescale_downscaling_threshold = 2.0f,
|
2013-10-24 20:20:16 +00:00
|
|
|
};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-12-21 19:03:36 +00:00
|
|
|
static int validate_scaler_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
static int validate_window_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
#define OPT_BASE_STRUCT struct gl_video_opts
|
2015-08-29 01:24:15 +00:00
|
|
|
|
|
|
|
#define SCALER_OPTS(n, i) \
|
|
|
|
OPT_STRING_VALIDATE(n, scaler[i].kernel.name, 0, validate_scaler_opt), \
|
|
|
|
OPT_FLOAT(n"-param1", scaler[i].kernel.params[0], 0), \
|
|
|
|
OPT_FLOAT(n"-param2", scaler[i].kernel.params[1], 0), \
|
|
|
|
OPT_FLOAT(n"-blur", scaler[i].kernel.blur, 0), \
|
|
|
|
OPT_FLOAT(n"-wparam", scaler[i].window.params[0], 0), \
|
|
|
|
OPT_FLAG(n"-clamp", scaler[i].clamp, 0), \
|
|
|
|
OPT_FLOATRANGE(n"-radius", scaler[i].radius, 0, 0.5, 16.0), \
|
|
|
|
OPT_FLOATRANGE(n"-antiring", scaler[i].antiring, 0, 0.0, 1.0), \
|
|
|
|
OPT_STRING_VALIDATE(n"-window", scaler[i].window.name, 0, validate_window_opt)
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
const struct m_sub_options gl_video_conf = {
|
2014-06-10 21:56:05 +00:00
|
|
|
.opts = (const m_option_t[]) {
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
OPT_FLAG("dumb-mode", dumb_mode, 0),
|
2015-02-03 16:12:04 +00:00
|
|
|
OPT_FLOATRANGE("gamma", gamma, 0, 0.1, 2.0),
|
2015-02-07 12:54:18 +00:00
|
|
|
OPT_FLAG("gamma-auto", gamma_auto, 0),
|
2015-03-31 05:31:35 +00:00
|
|
|
OPT_CHOICE_C("target-prim", target_prim, 0, mp_csp_prim_names),
|
|
|
|
OPT_CHOICE_C("target-trc", target_trc, 0, mp_csp_trc_names),
|
2013-03-01 20:19:20 +00:00
|
|
|
OPT_FLAG("pbo", pbo, 0),
|
2016-03-05 08:42:57 +00:00
|
|
|
SCALER_OPTS("scale", SCALER_SCALE),
|
|
|
|
SCALER_OPTS("dscale", SCALER_DSCALE),
|
|
|
|
SCALER_OPTS("cscale", SCALER_CSCALE),
|
|
|
|
SCALER_OPTS("tscale", SCALER_TSCALE),
|
2015-12-05 19:14:23 +00:00
|
|
|
OPT_INTRANGE("scaler-lut-size", scaler_lut_size, 0, 4, 10),
|
2013-05-25 21:47:55 +00:00
|
|
|
OPT_FLAG("scaler-resizes-only", scaler_resizes_only, 0),
|
2015-02-06 02:37:21 +00:00
|
|
|
OPT_FLAG("linear-scaling", linear_scaling, 0),
|
2015-11-07 16:49:14 +00:00
|
|
|
OPT_FLAG("correct-downscaling", correct_downscaling, 0),
|
2015-01-06 09:47:26 +00:00
|
|
|
OPT_FLAG("sigmoid-upscaling", sigmoid_upscaling, 0),
|
|
|
|
OPT_FLOATRANGE("sigmoid-center", sigmoid_center, 0, 0.0, 1.0),
|
|
|
|
OPT_FLOATRANGE("sigmoid-slope", sigmoid_slope, 0, 1.0, 20.0),
|
2013-03-01 20:19:20 +00:00
|
|
|
OPT_CHOICE("fbo-format", fbo_format, 0,
|
|
|
|
({"rgb", GL_RGB},
|
|
|
|
{"rgba", GL_RGBA},
|
|
|
|
{"rgb8", GL_RGB8},
|
2015-11-19 13:45:06 +00:00
|
|
|
{"rgba8", GL_RGBA8},
|
2013-03-01 20:19:20 +00:00
|
|
|
{"rgb10", GL_RGB10},
|
2013-10-23 15:46:57 +00:00
|
|
|
{"rgb10_a2", GL_RGB10_A2},
|
2013-03-01 20:19:20 +00:00
|
|
|
{"rgb16", GL_RGB16},
|
|
|
|
{"rgb16f", GL_RGB16F},
|
2013-03-28 20:44:33 +00:00
|
|
|
{"rgb32f", GL_RGB32F},
|
|
|
|
{"rgba12", GL_RGBA12},
|
|
|
|
{"rgba16", GL_RGBA16},
|
|
|
|
{"rgba16f", GL_RGBA16F},
|
2015-11-19 20:20:50 +00:00
|
|
|
{"rgba32f", GL_RGBA32F},
|
|
|
|
{"auto", 0})),
|
2013-03-28 20:39:17 +00:00
|
|
|
OPT_CHOICE_OR_INT("dither-depth", dither_depth, 0, -1, 16,
|
|
|
|
({"no", -1}, {"auto", 0})),
|
2013-05-25 23:48:39 +00:00
|
|
|
OPT_CHOICE("dither", dither_algo, 0,
|
|
|
|
({"fruit", 0}, {"ordered", 1}, {"no", -1})),
|
|
|
|
OPT_INTRANGE("dither-size-fruit", dither_size, 0, 2, 8),
|
|
|
|
OPT_FLAG("temporal-dither", temporal_dither, 0),
|
2015-07-20 17:09:22 +00:00
|
|
|
OPT_INTRANGE("temporal-dither-period", temporal_dither_period, 0, 1, 128),
|
2015-02-27 17:31:24 +00:00
|
|
|
OPT_CHOICE("alpha", alpha_mode, 0,
|
2013-09-19 14:55:56 +00:00
|
|
|
({"no", 0},
|
2015-02-27 17:31:24 +00:00
|
|
|
{"yes", 1},
|
2015-12-22 22:14:47 +00:00
|
|
|
{"blend", 2},
|
|
|
|
{"blend-tiles", 3})),
|
2013-12-01 22:39:13 +00:00
|
|
|
OPT_FLAG("rectangle-textures", use_rectangle, 0),
|
2014-12-09 20:34:01 +00:00
|
|
|
OPT_COLOR("background", background, 0),
|
2015-03-13 18:30:31 +00:00
|
|
|
OPT_FLAG("interpolation", interpolation, 0),
|
2016-01-27 20:07:17 +00:00
|
|
|
OPT_FLOAT("interpolation-threshold", interpolation_threshold, 0),
|
2015-04-11 17:16:34 +00:00
|
|
|
OPT_CHOICE("blend-subtitles", blend_subs, 0,
|
|
|
|
({"no", 0},
|
|
|
|
{"yes", 1},
|
|
|
|
{"video", 2})),
|
2015-03-27 12:27:40 +00:00
|
|
|
OPT_STRING("scale-shader", scale_shader, 0),
|
|
|
|
OPT_STRINGLIST("pre-shaders", pre_shaders, 0),
|
|
|
|
OPT_STRINGLIST("post-shaders", post_shaders, 0),
|
2016-04-20 23:33:13 +00:00
|
|
|
OPT_STRINGLIST("user-shaders", user_shaders, 0),
|
2015-09-05 15:39:27 +00:00
|
|
|
OPT_FLAG("deband", deband, 0),
|
|
|
|
OPT_SUBSTRUCT("deband", deband_opts, deband_conf, 0),
|
2015-09-23 20:43:27 +00:00
|
|
|
OPT_FLOAT("sharpen", unsharp, 0),
|
2016-03-05 11:02:01 +00:00
|
|
|
OPT_CHOICE("prescale-luma", prescale_luma, 0,
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
({"none", 0},
|
2015-12-03 08:32:40 +00:00
|
|
|
{"superxbr", 1}
|
|
|
|
#if HAVE_NNEDI
|
|
|
|
, {"nnedi3", 2}
|
|
|
|
#endif
|
|
|
|
)),
|
2015-10-26 22:43:48 +00:00
|
|
|
OPT_INTRANGE("prescale-passes",
|
|
|
|
prescale_passes, 0, 1, MAX_PRESCALE_PASSES),
|
|
|
|
OPT_FLOATRANGE("prescale-downscaling-threshold",
|
|
|
|
prescale_downscaling_threshold, 0, 0.0, 32.0),
|
|
|
|
OPT_SUBSTRUCT("superxbr", superxbr_opts, superxbr_conf, 0),
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
OPT_SUBSTRUCT("nnedi3", nnedi3_opts, nnedi3_conf, 0),
|
2015-03-23 01:42:19 +00:00
|
|
|
|
2015-01-13 23:45:31 +00:00
|
|
|
OPT_REMOVED("approx-gamma", "this is always enabled now"),
|
2015-01-20 20:46:19 +00:00
|
|
|
OPT_REMOVED("cscale-down", "chroma is never downscaled"),
|
2015-01-22 17:24:50 +00:00
|
|
|
OPT_REMOVED("scale-sep", "this is set automatically whenever sane"),
|
|
|
|
OPT_REMOVED("indirect", "this is set automatically whenever sane"),
|
2015-03-16 09:17:22 +00:00
|
|
|
OPT_REMOVED("srgb", "use target-prim=bt709:target-trc=srgb instead"),
|
2015-09-05 15:39:27 +00:00
|
|
|
OPT_REMOVED("source-shader", "use :deband to enable debanding"),
|
2015-01-20 20:46:19 +00:00
|
|
|
|
|
|
|
OPT_REPLACED("lscale", "scale"),
|
|
|
|
OPT_REPLACED("lscale-down", "scale-down"),
|
|
|
|
OPT_REPLACED("lparam1", "scale-param1"),
|
|
|
|
OPT_REPLACED("lparam2", "scale-param2"),
|
|
|
|
OPT_REPLACED("lradius", "scale-radius"),
|
|
|
|
OPT_REPLACED("lantiring", "scale-antiring"),
|
|
|
|
OPT_REPLACED("cparam1", "cscale-param1"),
|
|
|
|
OPT_REPLACED("cparam2", "cscale-param2"),
|
|
|
|
OPT_REPLACED("cradius", "cscale-radius"),
|
|
|
|
OPT_REPLACED("cantiring", "cscale-antiring"),
|
2015-03-13 18:30:31 +00:00
|
|
|
OPT_REPLACED("smoothmotion", "interpolation"),
|
2015-03-15 06:11:51 +00:00
|
|
|
OPT_REPLACED("smoothmotion-threshold", "tscale-param1"),
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
OPT_REPLACED("scale-down", "dscale"),
|
2015-11-07 16:49:14 +00:00
|
|
|
OPT_REPLACED("fancy-downscaling", "correct-downscaling"),
|
2016-03-05 11:02:01 +00:00
|
|
|
OPT_REPLACED("prescale", "prescale-luma"),
|
2015-01-20 20:46:19 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
{0}
|
|
|
|
},
|
|
|
|
.size = sizeof(struct gl_video_opts),
|
|
|
|
.defaults = &gl_video_opts_def,
|
|
|
|
};
|
|
|
|
|
|
|
|
static void uninit_rendering(struct gl_video *p);
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
static void uninit_scaler(struct gl_video *p, struct scaler *scaler);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
static void check_gl_features(struct gl_video *p);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
static bool init_format(struct gl_video *p, int fmt, bool test_only);
|
|
|
|
static void init_image_desc(struct gl_video *p, int fmt);
|
2015-07-15 10:22:49 +00:00
|
|
|
static void gl_video_upload_image(struct gl_video *p, struct mp_image *mpi);
|
2015-09-08 20:46:36 +00:00
|
|
|
static void assign_options(struct gl_video_opts *dst, struct gl_video_opts *src);
|
2016-04-08 20:21:31 +00:00
|
|
|
static void get_scale_factors(struct gl_video *p, bool transpose_rot, double xy[2]);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
#define GLSL(x) gl_sc_add(p->sc, #x "\n");
|
|
|
|
#define GLSLF(...) gl_sc_addf(p->sc, __VA_ARGS__)
|
2016-02-27 22:56:33 +00:00
|
|
|
#define GLSLHF(...) gl_sc_haddf(p->sc, __VA_ARGS__)
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static struct bstr load_cached_file(struct gl_video *p, const char *path)
|
2015-09-23 20:13:03 +00:00
|
|
|
{
|
|
|
|
if (!path || !path[0])
|
2016-04-20 23:33:13 +00:00
|
|
|
return (struct bstr){0};
|
2015-09-23 20:13:03 +00:00
|
|
|
for (int n = 0; n < p->num_files; n++) {
|
|
|
|
if (strcmp(p->files[n].path, path) == 0)
|
|
|
|
return p->files[n].body;
|
|
|
|
}
|
|
|
|
// not found -> load it
|
|
|
|
if (p->num_files == MP_ARRAY_SIZE(p->files)) {
|
|
|
|
// empty cache when it overflows
|
|
|
|
for (int n = 0; n < p->num_files; n++) {
|
|
|
|
talloc_free(p->files[n].path);
|
2016-04-20 23:33:13 +00:00
|
|
|
talloc_free(p->files[n].body.start);
|
2015-09-23 20:13:03 +00:00
|
|
|
}
|
|
|
|
p->num_files = 0;
|
|
|
|
}
|
|
|
|
struct bstr s = stream_read_file(path, p, p->global, 100000); // 100 kB
|
|
|
|
if (s.len) {
|
|
|
|
struct cached_file *new = &p->files[p->num_files++];
|
|
|
|
*new = (struct cached_file) {
|
|
|
|
.path = talloc_strdup(p, path),
|
2016-04-20 23:33:13 +00:00
|
|
|
.body = s,
|
2015-09-23 20:13:03 +00:00
|
|
|
};
|
|
|
|
return new->body;
|
|
|
|
}
|
2016-04-20 23:33:13 +00:00
|
|
|
return (struct bstr){0};
|
2015-09-23 20:13:03 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void debug_check_gl(struct gl_video *p, const char *msg)
|
|
|
|
{
|
|
|
|
if (p->gl_debug)
|
2013-09-11 22:57:32 +00:00
|
|
|
glCheckError(p->gl, p->log, msg);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_set_debug(struct gl_video *p, bool enable)
|
|
|
|
{
|
2014-12-23 01:46:44 +00:00
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
p->gl_debug = enable;
|
2015-01-30 10:12:58 +00:00
|
|
|
if (p->gl->debug_context)
|
|
|
|
gl_set_debug_logger(gl, enable ? p->log : NULL);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
static void gl_video_reset_surfaces(struct gl_video *p)
|
|
|
|
{
|
2015-11-28 14:45:35 +00:00
|
|
|
for (int i = 0; i < FBOSURFACES_MAX; i++)
|
2015-06-26 08:59:57 +00:00
|
|
|
p->surfaces[i].pts = MP_NOPTS_VALUE;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->surface_idx = 0;
|
|
|
|
p->surface_now = 0;
|
2015-07-02 11:17:20 +00:00
|
|
|
p->frames_drawn = 0;
|
2015-09-05 10:02:02 +00:00
|
|
|
p->output_fbo_valid = false;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void gl_video_reset_hooks(struct gl_video *p)
|
|
|
|
{
|
|
|
|
for (int i = 0; i < p->tex_hook_num; i++) {
|
|
|
|
if (p->tex_hooks[i].free)
|
|
|
|
p->tex_hooks[i].free(&p->tex_hooks[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
p->tex_hook_num = 0;
|
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
static inline int fbosurface_wrap(int id)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
2015-03-13 18:30:31 +00:00
|
|
|
id = id % FBOSURFACES_MAX;
|
|
|
|
return id < 0 ? id + FBOSURFACES_MAX : id;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void recreate_osd(struct gl_video *p)
|
|
|
|
{
|
2015-03-23 15:32:59 +00:00
|
|
|
mpgl_osd_destroy(p->osd);
|
|
|
|
p->osd = NULL;
|
|
|
|
if (p->osd_state) {
|
|
|
|
p->osd = mpgl_osd_init(p->gl, p->log, p->osd_state);
|
|
|
|
mpgl_osd_set_options(p->osd, p->opts.pbo);
|
|
|
|
}
|
2015-02-03 16:12:04 +00:00
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void gl_video_setup_hooks(struct gl_video *p);
|
2013-03-01 20:19:20 +00:00
|
|
|
static void reinit_rendering(struct gl_video *p)
|
|
|
|
{
|
2013-07-31 19:44:21 +00:00
|
|
|
MP_VERBOSE(p, "Reinit rendering.\n");
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
debug_check_gl(p, "before scaler initialization");
|
|
|
|
|
|
|
|
uninit_rendering(p);
|
|
|
|
|
|
|
|
recreate_osd(p);
|
2016-04-20 23:33:13 +00:00
|
|
|
|
|
|
|
gl_video_setup_hooks(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void uninit_rendering(struct gl_video *p)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++)
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
uninit_scaler(p, &p->scaler[n]);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
gl->DeleteTextures(1, &p->dither_texture);
|
|
|
|
p->dither_texture = 0;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
gl->DeleteBuffers(1, &p->nnedi3_weights_buffer);
|
2016-01-03 22:11:23 +00:00
|
|
|
p->nnedi3_weights_buffer = 0;
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
fbotex_uninit(&p->merge_fbo[n]);
|
|
|
|
fbotex_uninit(&p->scale_fbo[n]);
|
|
|
|
fbotex_uninit(&p->integer_fbo[n]);
|
|
|
|
}
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
fbotex_uninit(&p->indirect_fbo);
|
2015-03-23 01:42:19 +00:00
|
|
|
fbotex_uninit(&p->blend_subs_fbo);
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
for (int pass = 0; pass < MAX_PRESCALE_PASSES; pass++)
|
|
|
|
fbotex_uninit(&p->prescale_fbo[pass]);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2015-03-13 23:32:20 +00:00
|
|
|
for (int n = 0; n < FBOSURFACES_MAX; n++)
|
|
|
|
fbotex_uninit(&p->surfaces[n].fbotex);
|
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
for (int n = 0; n < MAX_SAVED_TEXTURES; n++)
|
|
|
|
fbotex_uninit(&p->hook_fbos[n]);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
gl_video_reset_surfaces(p);
|
2016-04-20 23:33:13 +00:00
|
|
|
gl_video_reset_hooks(p);
|
2016-05-12 09:27:00 +00:00
|
|
|
|
|
|
|
gl_sc_reset_error(p->sc);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
void gl_video_update_profile(struct gl_video *p)
|
|
|
|
{
|
|
|
|
if (p->use_lut_3d)
|
|
|
|
return;
|
|
|
|
|
|
|
|
p->use_lut_3d = true;
|
|
|
|
check_gl_features(p);
|
|
|
|
|
|
|
|
reinit_rendering(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool gl_video_get_lut3d(struct gl_video *p, enum mp_csp_prim prim,
|
|
|
|
enum mp_csp_trc trc)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
if (!p->cms || !p->use_lut_3d)
|
|
|
|
return false;
|
2014-03-24 22:30:12 +00:00
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
if (!gl_lcms_has_changed(p->cms, prim, trc))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
struct lut3d *lut3d = NULL;
|
|
|
|
if (!gl_lcms_get_lut3d(p->cms, &lut3d, prim, trc) || !lut3d) {
|
|
|
|
return false;
|
2015-11-19 20:21:04 +00:00
|
|
|
}
|
2014-12-23 01:48:58 +00:00
|
|
|
|
2014-03-24 22:30:12 +00:00
|
|
|
if (!p->lut_3d_texture)
|
|
|
|
gl->GenTextures(1, &p->lut_3d_texture);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + TEXUNIT_3DLUT);
|
|
|
|
gl->BindTexture(GL_TEXTURE_3D, p->lut_3d_texture);
|
|
|
|
gl->TexImage3D(GL_TEXTURE_3D, 0, GL_RGB16, lut3d->size[0], lut3d->size[1],
|
|
|
|
lut3d->size[2], 0, GL_RGB, GL_UNSIGNED_SHORT, lut3d->data);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
|
|
|
|
|
|
|
debug_check_gl(p, "after 3d lut creation");
|
2014-03-24 22:30:12 +00:00
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
return true;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Fill an img_tex struct from an FBO + some metadata
|
2016-04-16 16:14:32 +00:00
|
|
|
static struct img_tex img_tex_fbo(struct fbotex *fbo, enum plane_type type,
|
|
|
|
int components)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
assert(type != PLANE_NONE);
|
|
|
|
return (struct img_tex){
|
|
|
|
.type = type,
|
|
|
|
.gl_tex = fbo->texture,
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
.gl_target = GL_TEXTURE_2D,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.multiplier = 1.0,
|
|
|
|
.use_integer = false,
|
|
|
|
.tex_w = fbo->rw,
|
|
|
|
.tex_h = fbo->rh,
|
|
|
|
.w = fbo->lw,
|
|
|
|
.h = fbo->lh,
|
2016-04-16 16:14:32 +00:00
|
|
|
.transform = identity_trans,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.components = components,
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
};
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Bind an img_tex to a free texture unit and return its ID. At most
|
|
|
|
// TEXUNIT_VIDEO_NUM texture units can be bound at once
|
|
|
|
static int pass_bind(struct gl_video *p, struct img_tex tex)
|
2013-03-28 19:40:19 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
assert(p->pass_tex_num < TEXUNIT_VIDEO_NUM);
|
|
|
|
p->pass_tex[p->pass_tex_num] = tex;
|
|
|
|
return p->pass_tex_num++;
|
|
|
|
}
|
2013-03-28 19:40:19 +00:00
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
// Rotation by 90° and flipping.
|
|
|
|
static void get_plane_source_transform(struct gl_video *p, int w, int h,
|
|
|
|
struct gl_transform *out_tr)
|
|
|
|
{
|
|
|
|
struct gl_transform tr = identity_trans;
|
|
|
|
int a = p->image_params.rotate % 90 ? 0 : p->image_params.rotate / 90;
|
|
|
|
int sin90[4] = {0, 1, 0, -1}; // just to avoid rounding issues etc.
|
|
|
|
int cos90[4] = {1, 0, -1, 0};
|
|
|
|
struct gl_transform rot = {{{cos90[a], sin90[a]}, {-sin90[a], cos90[a]}}};
|
|
|
|
gl_transform_trans(rot, &tr);
|
|
|
|
|
|
|
|
// basically, recenter to keep the whole image in view
|
|
|
|
float b[2] = {1, 1};
|
|
|
|
gl_transform_vec(rot, &b[0], &b[1]);
|
|
|
|
tr.t[0] += b[0] < 0 ? w : 0;
|
|
|
|
tr.t[1] += b[1] < 0 ? h : 0;
|
|
|
|
|
|
|
|
if (p->image.image_flipped) {
|
|
|
|
struct gl_transform flip = {{{1, 0}, {0, -1}}, {0, h}};
|
|
|
|
gl_transform_trans(flip, &tr);
|
|
|
|
}
|
|
|
|
|
|
|
|
*out_tr = tr;
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Places a video_image's image textures + associated metadata into tex[]. The
|
2016-04-16 16:14:32 +00:00
|
|
|
// number of textures is equal to p->plane_count. Any necessary plane offsets
|
|
|
|
// are stored in off. (e.g. chroma position)
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static void pass_get_img_tex(struct gl_video *p, struct video_image *vimg,
|
2016-04-16 16:14:32 +00:00
|
|
|
struct img_tex tex[4], struct gl_transform off[4])
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
{
|
2015-01-22 17:29:37 +00:00
|
|
|
assert(vimg->mpi);
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Determine the chroma offset
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
float ls_w = 1.0 / (1 << p->image_desc.chroma_xs);
|
|
|
|
float ls_h = 1.0 / (1 << p->image_desc.chroma_ys);
|
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
struct gl_transform chroma = {{{ls_w, 0.0}, {0.0, ls_h}}};
|
|
|
|
|
2015-04-02 21:59:50 +00:00
|
|
|
if (p->image_params.chroma_location != MP_CHROMA_CENTER) {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
int cx, cy;
|
2015-04-02 21:59:50 +00:00
|
|
|
mp_get_chroma_location(p->image_params.chroma_location, &cx, &cy);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// By default texture coordinates are such that chroma is centered with
|
|
|
|
// any chroma subsampling. If a specific direction is given, make it
|
|
|
|
// so that the luma and chroma sample line up exactly.
|
|
|
|
// For 4:4:4, setting chroma location should have no effect at all.
|
|
|
|
// luma sample size (in chroma coord. space)
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
chroma.t[0] = ls_w < 1 ? ls_w * -cx / 2 : 0;
|
|
|
|
chroma.t[1] = ls_h < 1 ? ls_h * -cy / 2 : 0;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// FIXME: account for rotation in the chroma offset
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
|
|
|
// The existing code assumes we just have a single tex multiplier for
|
|
|
|
// all of the planes. This may change in the future
|
|
|
|
float tex_mul = 1.0 / mp_get_csp_mul(p->image_params.colorspace,
|
|
|
|
p->image_desc.component_bits,
|
|
|
|
p->image_desc.component_full_bits);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
memset(tex, 0, 4 * sizeof(tex[0]));
|
2015-03-13 12:42:05 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct texplane *t = &vimg->planes[n];
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
|
|
|
enum plane_type type;
|
|
|
|
if (n >= 3) {
|
|
|
|
type = PLANE_ALPHA;
|
|
|
|
} else if (p->image_desc.flags & MP_IMGFLAG_RGB) {
|
|
|
|
type = PLANE_RGB;
|
|
|
|
} else if (p->image_desc.flags & MP_IMGFLAG_YUV) {
|
|
|
|
type = n == 0 ? PLANE_LUMA : PLANE_CHROMA;
|
|
|
|
} else if (p->image_desc.flags & MP_IMGFLAG_XYZ) {
|
|
|
|
type = PLANE_XYZ;
|
|
|
|
} else {
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
|
|
|
|
tex[n] = (struct img_tex){
|
|
|
|
.type = type,
|
2016-01-26 19:47:32 +00:00
|
|
|
.gl_tex = t->gl_texture,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
.gl_target = t->gl_target,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.multiplier = tex_mul,
|
2016-01-26 19:47:32 +00:00
|
|
|
.use_integer = t->use_integer,
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
.tex_w = t->tex_w,
|
|
|
|
.tex_h = t->tex_h,
|
2015-09-02 10:52:11 +00:00
|
|
|
.w = t->w,
|
|
|
|
.h = t->h,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.components = p->image_desc.components[n],
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
};
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
snprintf(tex[n].swizzle, sizeof(tex[n].swizzle), "%s", t->swizzle);
|
2016-04-16 16:14:32 +00:00
|
|
|
get_plane_source_transform(p, t->w, t->h, &tex[n].transform);
|
2016-04-08 20:21:31 +00:00
|
|
|
if (p->image_params.rotate % 180 == 90)
|
|
|
|
MPSWAP(int, tex[n].w, tex[n].h);
|
2016-04-16 16:14:32 +00:00
|
|
|
|
|
|
|
off[n] = type == PLANE_CHROMA ? chroma : identity_trans;
|
2013-03-28 19:40:19 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
static void init_video(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (p->hwdec && p->hwdec->driver->imgfmt == p->image_params.imgfmt) {
|
2015-01-29 18:53:49 +00:00
|
|
|
if (p->hwdec->driver->reinit(p->hwdec, &p->image_params) < 0)
|
|
|
|
MP_ERR(p, "Initializing texture for hardware decoding failed.\n");
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
init_image_desc(p, p->image_params.imgfmt);
|
|
|
|
p->hwdec_active = true;
|
|
|
|
} else {
|
|
|
|
init_format(p, p->image_params.imgfmt, false);
|
2015-01-29 18:53:49 +00:00
|
|
|
}
|
2013-12-01 22:39:13 +00:00
|
|
|
|
2016-05-11 15:39:38 +00:00
|
|
|
// Format-dependent checks.
|
|
|
|
check_gl_features(p);
|
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
mp_image_params_guess_csp(&p->image_params);
|
2013-12-01 22:39:13 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
int eq_caps = MP_CSP_EQ_CAPS_GAMMA;
|
2015-12-12 13:47:30 +00:00
|
|
|
if (p->image_params.colorspace != MP_CSP_BT_2020_C)
|
2013-03-01 20:19:20 +00:00
|
|
|
eq_caps |= MP_CSP_EQ_CAPS_COLORMATRIX;
|
2014-03-31 02:51:47 +00:00
|
|
|
if (p->image_desc.flags & MP_IMGFLAG_XYZ)
|
|
|
|
eq_caps |= MP_CSP_EQ_CAPS_BRIGHTNESS;
|
2013-03-01 20:19:20 +00:00
|
|
|
p->video_eq.capabilities = eq_caps;
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
av_lfg_init(&p->lfg, 1);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
debug_check_gl(p, "before video texture creation");
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (!p->hwdec_active) {
|
|
|
|
struct video_image *vimg = &p->image;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
GLenum gl_target =
|
|
|
|
p->opts.use_rectangle ? GL_TEXTURE_RECTANGLE : GL_TEXTURE_2D;
|
2015-09-02 11:08:18 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
struct mp_image layout = {0};
|
|
|
|
mp_image_set_params(&layout, &p->image_params);
|
|
|
|
|
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2013-03-28 19:40:19 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
plane->gl_target = gl_target;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
plane->w = plane->tex_w = mp_image_plane_w(&layout, n);
|
|
|
|
plane->h = plane->tex_h = mp_image_plane_h(&layout, n);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + n);
|
|
|
|
gl->GenTextures(1, &plane->gl_texture);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->BindTexture(gl_target, plane->gl_texture);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->TexImage2D(gl_target, 0, plane->gl_internal_format,
|
2015-09-02 10:52:11 +00:00
|
|
|
plane->w, plane->h, 0,
|
2013-11-03 23:00:18 +00:00
|
|
|
plane->gl_format, plane->gl_type, NULL);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
int filter = plane->use_integer ? GL_NEAREST : GL_LINEAR;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_MIN_FILTER, filter);
|
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_MAG_FILTER, filter);
|
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
2013-03-28 19:48:53 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
MP_VERBOSE(p, "Texture for plane %d: %dx%d\n", n, plane->w, plane->h);
|
|
|
|
}
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
debug_check_gl(p, "after video texture creation");
|
|
|
|
|
|
|
|
reinit_rendering(p);
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
static void unref_current_image(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
|
|
|
if (vimg->hwdec_mapped) {
|
|
|
|
assert(p->hwdec_active);
|
|
|
|
if (p->hwdec->driver->unmap)
|
|
|
|
p->hwdec->driver->unmap(p->hwdec);
|
|
|
|
memset(vimg->planes, 0, sizeof(vimg->planes));
|
|
|
|
vimg->hwdec_mapped = false;
|
|
|
|
}
|
|
|
|
mp_image_unrefp(&vimg->mpi);
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void uninit_video(struct gl_video *p)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
uninit_rendering(p);
|
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
|
|
|
|
2015-03-13 23:32:20 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
2013-03-28 19:40:19 +00:00
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->DeleteTextures(1, &plane->gl_texture);
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->DeleteBuffers(1, &plane->gl_buffer);
|
|
|
|
}
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
*vimg = (struct video_image){0};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-01-07 18:00:26 +00:00
|
|
|
// Invalidate image_params to ensure that gl_video_config() will call
|
|
|
|
// init_video() on uninitialized gl_video.
|
2015-01-29 18:53:49 +00:00
|
|
|
p->real_image_params = (struct mp_image_params){0};
|
|
|
|
p->image_params = p->real_image_params;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
p->hwdec_active = false;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void pass_prepare_src_tex(struct gl_video *p)
|
2013-05-25 23:48:39 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct gl_shader_cache *sc = p->sc;
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int n = 0; n < p->pass_tex_num; n++) {
|
|
|
|
struct img_tex *s = &p->pass_tex[n];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (!s->gl_tex)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
char texture_name[32];
|
|
|
|
char texture_size[32];
|
2016-02-25 20:27:55 +00:00
|
|
|
char pixel_size[32];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
snprintf(texture_name, sizeof(texture_name), "texture%d", n);
|
|
|
|
snprintf(texture_size, sizeof(texture_size), "texture_size%d", n);
|
2016-02-25 20:27:55 +00:00
|
|
|
snprintf(pixel_size, sizeof(pixel_size), "pixel_size%d", n);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
if (s->use_integer) {
|
|
|
|
gl_sc_uniform_sampler_ui(sc, texture_name, n);
|
|
|
|
} else {
|
|
|
|
gl_sc_uniform_sampler(sc, texture_name, s->gl_target, n);
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
float f[2] = {1, 1};
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
if (s->gl_target != GL_TEXTURE_RECTANGLE) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
f[0] = s->tex_w;
|
|
|
|
f[1] = s->tex_h;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
gl_sc_uniform_vec2(sc, texture_size, f);
|
2016-02-25 20:27:55 +00:00
|
|
|
gl_sc_uniform_vec2(sc, pixel_size, (GLfloat[]){1.0f / f[0],
|
|
|
|
1.0f / f[1]});
|
2013-05-25 23:48:39 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + n);
|
|
|
|
gl->BindTexture(s->gl_target, s->gl_tex);
|
|
|
|
}
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
|
|
|
}
|
2013-05-25 23:48:39 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void render_pass_quad(struct gl_video *p, int vp_w, int vp_h,
|
2016-03-28 14:30:48 +00:00
|
|
|
const struct mp_rect *dst)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
2016-03-28 14:13:56 +00:00
|
|
|
struct vertex va[4] = {0};
|
2013-05-25 23:48:39 +00:00
|
|
|
|
2015-03-13 20:14:18 +00:00
|
|
|
struct gl_transform t;
|
|
|
|
gl_transform_ortho(&t, 0, vp_w, 0, vp_h);
|
2013-05-25 23:48:39 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
float x[2] = {dst->x0, dst->x1};
|
|
|
|
float y[2] = {dst->y0, dst->y1};
|
2015-03-13 20:14:18 +00:00
|
|
|
gl_transform_vec(t, &x[0], &y[0]);
|
|
|
|
gl_transform_vec(t, &x[1], &y[1]);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
struct vertex *v = &va[n];
|
|
|
|
v->position.x = x[n / 2];
|
|
|
|
v->position.y = y[n % 2];
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int i = 0; i < p->pass_tex_num; i++) {
|
|
|
|
struct img_tex *s = &p->pass_tex[i];
|
|
|
|
if (!s->gl_tex)
|
|
|
|
continue;
|
2016-04-08 20:21:31 +00:00
|
|
|
struct gl_transform tr = s->transform;
|
2016-03-28 14:30:48 +00:00
|
|
|
float tx = (n / 2) * s->w;
|
|
|
|
float ty = (n % 2) * s->h;
|
2016-04-08 20:21:31 +00:00
|
|
|
gl_transform_vec(tr, &tx, &ty);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
bool rect = s->gl_target == GL_TEXTURE_RECTANGLE;
|
2016-03-28 14:30:48 +00:00
|
|
|
v->texcoord[i].x = tx / (rect ? 1 : s->tex_w);
|
|
|
|
v->texcoord[i].y = ty / (rect ? 1 : s->tex_h);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-16 19:10:48 +00:00
|
|
|
p->gl->Viewport(0, 0, vp_w, abs(vp_h));
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl_vao_draw_data(&p->vao, GL_TRIANGLE_STRIP, va, 4);
|
|
|
|
|
|
|
|
debug_check_gl(p, "after rendering");
|
2013-05-25 23:48:39 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// flags: see render_pass_quad
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void finish_pass_direct(struct gl_video *p, GLint fbo, int vp_w, int vp_h,
|
2016-03-28 14:30:48 +00:00
|
|
|
const struct mp_rect *dst)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
pass_prepare_src_tex(p);
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
gl_sc_gen_shader_and_reset(p->sc);
|
2016-03-28 14:30:48 +00:00
|
|
|
render_pass_quad(p, vp_w, vp_h, dst);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, 0);
|
|
|
|
memset(&p->pass_tex, 0, sizeof(p->pass_tex));
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
p->pass_tex_num = 0;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// dst_fbo: this will be used for rendering; possibly reallocating the whole
|
|
|
|
// FBO, if the required parameters have changed
|
|
|
|
// w, h: required FBO target dimension, and also defines the target rectangle
|
|
|
|
// used for rasterization
|
|
|
|
// flags: 0 or combination of FBOTEX_FUZZY_W/FBOTEX_FUZZY_H (setting the fuzzy
|
2015-03-13 10:53:54 +00:00
|
|
|
// flags allows the FBO to be larger than the w/h parameters)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void finish_pass_fbo(struct gl_video *p, struct fbotex *dst_fbo,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int w, int h, int flags)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
fbotex_change(dst_fbo, p->gl, p->log, w, h, p->opts.fbo_format, flags);
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_direct(p, dst_fbo->fbo, dst_fbo->rw, dst_fbo->rh,
|
2016-03-28 14:30:48 +00:00
|
|
|
&(struct mp_rect){0, 0, w, h});
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
// Copy a texture to the vec4 color, while increasing offset. Also applies
|
|
|
|
// the texture multiplier to the sampled color
|
|
|
|
static void copy_img_tex(struct gl_video *p, int *offset, struct img_tex img)
|
|
|
|
{
|
|
|
|
int count = img.components;
|
|
|
|
assert(*offset + count <= 4);
|
|
|
|
|
|
|
|
int id = pass_bind(p, img);
|
|
|
|
char src[5] = {0};
|
|
|
|
char dst[5] = {0};
|
|
|
|
const char *tex_fmt = img.swizzle[0] ? img.swizzle : "rgba";
|
|
|
|
const char *dst_fmt = "rgba";
|
|
|
|
for (int i = 0; i < count; i++) {
|
|
|
|
src[i] = tex_fmt[i];
|
|
|
|
dst[i] = dst_fmt[*offset + i];
|
|
|
|
}
|
|
|
|
|
|
|
|
if (img.use_integer) {
|
|
|
|
uint64_t tex_max = 1ull << p->image_desc.component_full_bits;
|
|
|
|
img.multiplier *= 1.0 / (tex_max - 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
GLSLF("color.%s = %f * vec4(texture(texture%d, texcoord%d)).%s;\n",
|
|
|
|
dst, img.multiplier, id, id, src);
|
|
|
|
|
|
|
|
*offset += count;
|
|
|
|
}
|
|
|
|
|
2016-03-05 11:38:51 +00:00
|
|
|
static void skip_unused(struct gl_video *p, int num_components)
|
|
|
|
{
|
|
|
|
for (int i = num_components; i < 4; i++)
|
|
|
|
GLSLF("color.%c = %f;\n", "rgba"[i], i < 3 ? 0.0 : 1.0);
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
static void uninit_scaler(struct gl_video *p, struct scaler *scaler)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
2015-03-13 23:32:20 +00:00
|
|
|
fbotex_uninit(&scaler->sep_fbo);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->DeleteTextures(1, &scaler->gl_lut);
|
|
|
|
scaler->gl_lut = 0;
|
|
|
|
scaler->kernel = NULL;
|
|
|
|
scaler->initialized = false;
|
|
|
|
}
|
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
static void hook_prelude(struct gl_video *p, const char *name, int id)
|
|
|
|
{
|
|
|
|
GLSLHF("#define %s texture%d\n", name, id);
|
|
|
|
GLSLHF("#define %s_pos texcoord%d\n", name, id);
|
|
|
|
GLSLHF("#define %s_size texture_size%d\n", name, id);
|
|
|
|
GLSLHF("#define %s_pt pixel_size%d\n", name, id);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool saved_tex_find(struct gl_video *p, const char *name,
|
|
|
|
struct img_tex *out)
|
|
|
|
{
|
|
|
|
if (!name || !out)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
for (int i = 0; i < p->saved_tex_num; i++) {
|
|
|
|
if (strcmp(p->saved_tex[i].name, name) == 0) {
|
|
|
|
*out = p->saved_tex[i].tex;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void saved_tex_store(struct gl_video *p, const char *name,
|
|
|
|
struct img_tex tex)
|
|
|
|
{
|
|
|
|
assert(name);
|
|
|
|
|
|
|
|
for (int i = 0; i < p->saved_tex_num; i++) {
|
|
|
|
if (strcmp(p->saved_tex[i].name, name) == 0) {
|
|
|
|
p->saved_tex[i].tex = tex;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(p->saved_tex_num < MAX_SAVED_TEXTURES);
|
|
|
|
p->saved_tex[p->saved_tex_num++] = (struct saved_tex) {
|
|
|
|
.name = name,
|
|
|
|
.tex = tex
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
|
|
|
// Process hooks for a plane, saving the result and returning a new img_tex
|
|
|
|
// If 'trans' is NULL, the shader is forbidden from transforming tex
|
|
|
|
static struct img_tex pass_hook(struct gl_video *p, const char *name,
|
|
|
|
struct img_tex tex, struct gl_transform *trans)
|
|
|
|
{
|
|
|
|
if (!name)
|
|
|
|
return tex;
|
|
|
|
|
|
|
|
saved_tex_store(p, name, tex);
|
|
|
|
|
|
|
|
MP_DBG(p, "Running hooks for %s\n", name);
|
|
|
|
for (int i = 0; i < p->tex_hook_num; i++) {
|
|
|
|
struct tex_hook *hook = &p->tex_hooks[i];
|
|
|
|
|
|
|
|
if (strcmp(hook->hook_tex, name) != 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Bind all necessary textures and add them to the prelude
|
|
|
|
for (int t = 0; t < TEXUNIT_VIDEO_NUM; t++) {
|
|
|
|
const char *bind_name = hook->bind_tex[t];
|
|
|
|
struct img_tex bind_tex;
|
|
|
|
|
|
|
|
if (!bind_name)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// This is a special name that means "currently hooked texture"
|
|
|
|
if (strcmp(bind_name, "HOOKED") == 0) {
|
|
|
|
int id = pass_bind(p, tex);
|
|
|
|
hook_prelude(p, "HOOKED", id);
|
|
|
|
hook_prelude(p, name, id);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!saved_tex_find(p, bind_name, &bind_tex)) {
|
|
|
|
// Clean up texture bindings and just return as-is, stop
|
|
|
|
// all further processing of this hook
|
|
|
|
MP_ERR(p, "Failed running hook for %s: No saved texture named"
|
|
|
|
" %s!\n", name, bind_name);
|
|
|
|
p->pass_tex_num -= t;
|
|
|
|
return tex;
|
|
|
|
}
|
|
|
|
|
|
|
|
hook_prelude(p, bind_name, pass_bind(p, bind_tex));
|
|
|
|
}
|
|
|
|
|
|
|
|
// Run the actual hook. This generates a series of GLSL shader
|
|
|
|
// instructions sufficient for drawing the hook's output
|
|
|
|
struct gl_transform hook_off = identity_trans;
|
|
|
|
hook->hook(p, tex, &hook_off, hook->priv);
|
|
|
|
|
|
|
|
int comps = hook->components ? hook->components : tex.components;
|
|
|
|
skip_unused(p, comps);
|
|
|
|
|
|
|
|
// Compute the updated FBO dimensions and store the result
|
|
|
|
struct mp_rect_f sz = {0, 0, tex.w, tex.h};
|
|
|
|
gl_transform_rect(hook_off, &sz);
|
|
|
|
int w = lroundf(fabs(sz.x1 - sz.x0));
|
|
|
|
int h = lroundf(fabs(sz.y1 - sz.y0));
|
2016-04-19 18:45:40 +00:00
|
|
|
|
|
|
|
assert(p->hook_fbo_num < MAX_SAVED_TEXTURES);
|
|
|
|
struct fbotex *fbo = &p->hook_fbos[p->hook_fbo_num++];
|
|
|
|
finish_pass_fbo(p, fbo, w, h, 0);
|
2016-04-16 16:14:32 +00:00
|
|
|
|
|
|
|
const char *store_name = hook->save_tex ? hook->save_tex : name;
|
2016-04-19 18:45:40 +00:00
|
|
|
struct img_tex saved_tex = img_tex_fbo(fbo, tex.type, comps);
|
2016-04-16 16:14:32 +00:00
|
|
|
|
|
|
|
// If the texture we're saving overwrites the "current" texture, also
|
|
|
|
// update the tex parameter so that the future loop cycles will use the
|
|
|
|
// updated values, and export the offset
|
|
|
|
if (strcmp(store_name, name) == 0) {
|
|
|
|
if (!trans && !gl_transform_eq(hook_off, identity_trans)) {
|
|
|
|
MP_ERR(p, "Hook tried changing size of unscalable texture %s!\n",
|
|
|
|
name);
|
|
|
|
return tex;
|
|
|
|
}
|
|
|
|
|
|
|
|
tex = saved_tex;
|
|
|
|
if (trans)
|
|
|
|
gl_transform_trans(hook_off, trans);
|
|
|
|
}
|
|
|
|
|
|
|
|
saved_tex_store(p, store_name, saved_tex);
|
|
|
|
}
|
|
|
|
|
|
|
|
return tex;
|
|
|
|
}
|
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
// This can be used at any time in the middle of rendering to specify an
|
|
|
|
// optional hook point, which if triggered will render out to a new FBO and
|
|
|
|
// load the result back into vec4 color. Offsets applied by the hooks are
|
|
|
|
// accumulated in tex_trans, and the FBO is dimensioned according
|
|
|
|
// to p->texture_w/h
|
|
|
|
static void pass_opt_hook_point(struct gl_video *p, const char *name,
|
|
|
|
struct gl_transform *tex_trans)
|
|
|
|
{
|
|
|
|
if (!name)
|
|
|
|
return;
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
for (int i = 0; i < p->tex_hook_num; i++) {
|
|
|
|
struct tex_hook *hook = &p->tex_hooks[i];
|
|
|
|
|
|
|
|
if (strcmp(hook->hook_tex, name) == 0)
|
|
|
|
goto found;
|
|
|
|
|
|
|
|
for (int b = 0; b < TEXUNIT_VIDEO_NUM; b++) {
|
|
|
|
if (hook->bind_tex[b] && strcmp(hook->bind_tex[b], name) == 0)
|
|
|
|
goto found;
|
|
|
|
}
|
2016-04-19 18:45:40 +00:00
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
// Nothing uses this texture, don't bother storing it
|
|
|
|
return;
|
2016-04-19 18:45:40 +00:00
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
found:
|
2016-04-19 18:45:40 +00:00
|
|
|
assert(p->hook_fbo_num < MAX_SAVED_TEXTURES);
|
|
|
|
struct fbotex *fbo = &p->hook_fbos[p->hook_fbo_num++];
|
|
|
|
finish_pass_fbo(p, fbo, p->texture_w, p->texture_h, 0);
|
2016-04-20 23:33:13 +00:00
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
struct img_tex img = img_tex_fbo(fbo, PLANE_RGB, p->components);
|
|
|
|
img = pass_hook(p, name, img, tex_trans);
|
|
|
|
copy_img_tex(p, &(int){0}, img);
|
|
|
|
p->texture_w = img.w;
|
|
|
|
p->texture_h = img.h;
|
|
|
|
p->components = img.components;
|
|
|
|
}
|
2016-04-16 16:14:32 +00:00
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void load_shader(struct gl_video *p, struct bstr body)
|
2015-03-27 12:27:40 +00:00
|
|
|
{
|
2016-04-20 23:33:13 +00:00
|
|
|
gl_sc_hadd_bstr(p->sc, body);
|
2015-03-27 12:27:40 +00:00
|
|
|
gl_sc_uniform_f(p->sc, "random", (double)av_lfg_get(&p->lfg) / UINT32_MAX);
|
|
|
|
gl_sc_uniform_f(p->sc, "frame", p->frames_uploaded);
|
2016-02-22 21:07:04 +00:00
|
|
|
gl_sc_uniform_vec2(p->sc, "image_size", (GLfloat[]){p->image_params.w,
|
|
|
|
p->image_params.h});
|
2015-03-27 12:27:40 +00:00
|
|
|
}
|
|
|
|
|
2016-01-25 19:24:41 +00:00
|
|
|
static const char *get_custom_shader_fn(struct gl_video *p, const char *body)
|
|
|
|
{
|
|
|
|
if (!p->gl->es && strstr(body, "sample") && !strstr(body, "sample_pixel")) {
|
|
|
|
if (!p->custom_shader_fn_warned) {
|
|
|
|
MP_WARN(p, "sample() is deprecated in custom shaders. "
|
|
|
|
"Use sample_pixel()\n");
|
|
|
|
p->custom_shader_fn_warned = true;
|
|
|
|
}
|
|
|
|
return "sample";
|
|
|
|
}
|
|
|
|
return "sample_pixel";
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
// Semantic equality
|
|
|
|
static bool double_seq(double a, double b)
|
|
|
|
{
|
|
|
|
return (isnan(a) && isnan(b)) || a == b;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool scaler_fun_eq(struct scaler_fun a, struct scaler_fun b)
|
|
|
|
{
|
|
|
|
if ((a.name && !b.name) || (b.name && !a.name))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return ((!a.name && !b.name) || strcmp(a.name, b.name) == 0) &&
|
|
|
|
double_seq(a.params[0], b.params[0]) &&
|
|
|
|
double_seq(a.params[1], b.params[1]) &&
|
|
|
|
a.blur == b.blur;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool scaler_conf_eq(struct scaler_config a, struct scaler_config b)
|
|
|
|
{
|
|
|
|
// Note: antiring isn't compared because it doesn't affect LUT
|
|
|
|
// generation
|
|
|
|
return scaler_fun_eq(a.kernel, b.kernel) &&
|
|
|
|
scaler_fun_eq(a.window, b.window) &&
|
2015-08-20 19:45:58 +00:00
|
|
|
a.radius == b.radius &&
|
|
|
|
a.clamp == b.clamp;
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void reinit_scaler(struct gl_video *p, struct scaler *scaler,
|
|
|
|
const struct scaler_config *conf,
|
|
|
|
double scale_factor,
|
|
|
|
int sizes[])
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (scaler_conf_eq(scaler->conf, *conf) &&
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->scale_factor == scale_factor &&
|
|
|
|
scaler->initialized)
|
2014-04-20 19:30:23 +00:00
|
|
|
return;
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
uninit_scaler(p, scaler);
|
2014-04-20 19:30:23 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
scaler->conf = *conf;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->scale_factor = scale_factor;
|
|
|
|
scaler->insufficient = false;
|
|
|
|
scaler->initialized = true;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const struct filter_kernel *t_kernel = mp_find_filter_kernel(conf->kernel.name);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (!t_kernel)
|
|
|
|
return;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->kernel_storage = *t_kernel;
|
|
|
|
scaler->kernel = &scaler->kernel_storage;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const char *win = conf->window.name;
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
if (!win || !win[0])
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
win = t_kernel->window; // fall back to the scaler's default window
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
const struct filter_window *t_window = mp_find_filter_window(win);
|
|
|
|
if (t_window)
|
|
|
|
scaler->kernel->w = *t_window;
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
for (int n = 0; n < 2; n++) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (!isnan(conf->kernel.params[n]))
|
|
|
|
scaler->kernel->f.params[n] = conf->kernel.params[n];
|
|
|
|
if (!isnan(conf->window.params[n]))
|
|
|
|
scaler->kernel->w.params[n] = conf->window.params[n];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
2014-04-20 19:37:18 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (conf->kernel.blur > 0.0)
|
|
|
|
scaler->kernel->f.blur = conf->kernel.blur;
|
|
|
|
if (conf->window.blur > 0.0)
|
|
|
|
scaler->kernel->w.blur = conf->window.blur;
|
2014-04-20 19:30:23 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (scaler->kernel->f.resizable && conf->radius > 0.0)
|
|
|
|
scaler->kernel->f.radius = conf->radius;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-08-20 19:45:58 +00:00
|
|
|
scaler->kernel->clamp = conf->clamp;
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
scaler->insufficient = !mp_init_filter(scaler->kernel, sizes, scale_factor);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-11-19 20:20:40 +00:00
|
|
|
if (scaler->kernel->polar && (gl->mpgl_caps & MPGL_CAP_1D_TEX)) {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->gl_target = GL_TEXTURE_1D;
|
|
|
|
} else {
|
|
|
|
scaler->gl_target = GL_TEXTURE_2D;
|
|
|
|
}
|
|
|
|
|
|
|
|
int size = scaler->kernel->size;
|
|
|
|
int elems_per_pixel = 4;
|
|
|
|
if (size == 1) {
|
|
|
|
elems_per_pixel = 1;
|
|
|
|
} else if (size == 2) {
|
|
|
|
elems_per_pixel = 2;
|
|
|
|
} else if (size == 6) {
|
|
|
|
elems_per_pixel = 3;
|
|
|
|
}
|
|
|
|
int width = size / elems_per_pixel;
|
|
|
|
assert(size == width * elems_per_pixel);
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_float16_format(gl, elems_per_pixel);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLenum target = scaler->gl_target;
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + TEXUNIT_SCALERS + scaler->index);
|
|
|
|
|
|
|
|
if (!scaler->gl_lut)
|
|
|
|
gl->GenTextures(1, &scaler->gl_lut);
|
|
|
|
|
|
|
|
gl->BindTexture(target, scaler->gl_lut);
|
|
|
|
|
2015-12-05 19:14:23 +00:00
|
|
|
scaler->lut_size = 1 << p->opts.scaler_lut_size;
|
2015-12-05 18:54:25 +00:00
|
|
|
|
|
|
|
float *weights = talloc_array(NULL, float, scaler->lut_size * size);
|
|
|
|
mp_compute_lut(scaler->kernel, scaler->lut_size, weights);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
if (target == GL_TEXTURE_1D) {
|
2015-12-05 18:54:25 +00:00
|
|
|
gl->TexImage1D(target, 0, fmt->internal_format, scaler->lut_size,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
0, fmt->format, GL_FLOAT, weights);
|
|
|
|
} else {
|
2015-12-05 18:54:25 +00:00
|
|
|
gl->TexImage2D(target, 0, fmt->internal_format, width, scaler->lut_size,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
0, fmt->format, GL_FLOAT, weights);
|
|
|
|
}
|
|
|
|
|
|
|
|
talloc_free(weights);
|
|
|
|
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
if (target != GL_TEXTURE_1D)
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
|
|
|
|
|
|
|
debug_check_gl(p, "after initializing scaler");
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-09-05 12:03:00 +00:00
|
|
|
// Special helper for sampling from two separated stages
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static void pass_sample_separated(struct gl_video *p, struct img_tex src,
|
|
|
|
struct scaler *scaler, int w, int h)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Separate the transformation into x and y components, per pass
|
|
|
|
struct gl_transform t_x = {
|
|
|
|
.m = {{src.transform.m[0][0], 0.0}, {src.transform.m[1][0], 1.0}},
|
|
|
|
.t = {src.transform.t[0], 0.0},
|
|
|
|
};
|
|
|
|
struct gl_transform t_y = {
|
|
|
|
.m = {{1.0, src.transform.m[0][1]}, {0.0, src.transform.m[1][1]}},
|
|
|
|
.t = {0.0, src.transform.t[1]},
|
|
|
|
};
|
|
|
|
|
|
|
|
// First pass (scale only in the y dir)
|
|
|
|
src.transform = t_y;
|
|
|
|
sampler_prelude(p->sc, pass_bind(p, src));
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// pass 1\n");
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_separated_gen(p->sc, scaler, 0, 1);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
GLSLF("color *= %f;\n", src.multiplier);
|
|
|
|
finish_pass_fbo(p, &scaler->sep_fbo, src.w, h, FBOTEX_FUZZY_H);
|
|
|
|
|
|
|
|
// Second pass (scale only in the x dir)
|
2016-04-16 16:14:32 +00:00
|
|
|
src = img_tex_fbo(&scaler->sep_fbo, src.type, src.components);
|
|
|
|
src.transform = t_x;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
sampler_prelude(p->sc, pass_bind(p, src));
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// pass 2\n");
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_separated_gen(p->sc, scaler, 1, 0);
|
2015-03-15 05:27:11 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Sample from img_tex, with the src rectangle given by it.
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// The dst rectangle is implicit by what the caller will do next, but w and h
|
|
|
|
// must still be what is going to be used (to dimension FBOs correctly).
|
2016-02-23 15:18:17 +00:00
|
|
|
// This will write the scaled contents to the vec4 "color".
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// The scaler unit is initialized by this function; in order to avoid cache
|
|
|
|
// thrashing, the scaler unit should usually use the same parameters.
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static void pass_sample(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct scaler *scaler, const struct scaler_config *conf,
|
|
|
|
double scale_factor, int w, int h)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
reinit_scaler(p, scaler, conf, scale_factor, filter_sizes);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
bool is_separated = scaler->kernel && !scaler->kernel->polar;
|
|
|
|
|
|
|
|
// Set up the transformation+prelude and bind the texture, for everything
|
|
|
|
// other than separated scaling (which does this in the subfunction)
|
|
|
|
if (!is_separated)
|
|
|
|
sampler_prelude(p->sc, pass_bind(p, tex));
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// Dispatch the scaler. They're all wildly different.
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const char *name = scaler->conf.kernel.name;
|
|
|
|
if (strcmp(name, "bilinear") == 0) {
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = texture(tex, pos);)
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
} else if (strcmp(name, "bicubic_fast") == 0) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_bicubic_fast(p->sc);
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
} else if (strcmp(name, "oversample") == 0) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_oversample(p->sc, scaler, w, h);
|
2015-03-27 12:27:40 +00:00
|
|
|
} else if (strcmp(name, "custom") == 0) {
|
2016-04-20 23:33:13 +00:00
|
|
|
struct bstr body = load_cached_file(p, p->opts.scale_shader);
|
|
|
|
if (body.start) {
|
2015-03-27 12:27:40 +00:00
|
|
|
load_shader(p, body);
|
2016-04-20 23:33:13 +00:00
|
|
|
const char *fn_name = get_custom_shader_fn(p, body.start);
|
2015-03-27 12:27:40 +00:00
|
|
|
GLSLF("// custom scale-shader\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSLF("color = %s(tex, pos, size);\n", fn_name);
|
2015-03-27 12:27:40 +00:00
|
|
|
} else {
|
|
|
|
p->opts.scale_shader = NULL;
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
} else if (scaler->kernel && scaler->kernel->polar) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_polar(p->sc, scaler);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
} else if (scaler->kernel) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
pass_sample_separated(p, tex, scaler, w, h);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
} else {
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Should never happen
|
|
|
|
abort();
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Apply any required multipliers. Separated scaling already does this in
|
|
|
|
// its first stage
|
|
|
|
if (!is_separated)
|
|
|
|
GLSLF("color *= %f;\n", tex.multiplier);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Micro-optimization: Avoid scaling unneeded channels
|
2016-03-05 11:38:51 +00:00
|
|
|
skip_unused(p, tex.components);
|
2015-01-29 14:50:21 +00:00
|
|
|
}
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
// Get the number of passes for prescaler, with given display size.
|
2016-04-16 16:14:32 +00:00
|
|
|
static int get_prescale_passes(struct gl_video *p)
|
2015-10-26 22:43:48 +00:00
|
|
|
{
|
2016-03-05 11:02:01 +00:00
|
|
|
if (!p->opts.prescale_luma)
|
2015-10-26 22:43:48 +00:00
|
|
|
return 0;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
// The downscaling threshold check is turned off.
|
|
|
|
if (p->opts.prescale_downscaling_threshold < 1.0f)
|
|
|
|
return p->opts.prescale_passes;
|
|
|
|
|
|
|
|
double scale_factors[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, true, scale_factors);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
|
|
|
int passes = 0;
|
|
|
|
for (; passes < p->opts.prescale_passes; passes ++) {
|
|
|
|
// The scale factor happens to be the same for superxbr and nnedi3.
|
|
|
|
scale_factors[0] /= 2;
|
|
|
|
scale_factors[1] /= 2;
|
|
|
|
|
|
|
|
if (1.0f / scale_factors[0] > p->opts.prescale_downscaling_threshold)
|
|
|
|
break;
|
|
|
|
if (1.0f / scale_factors[1] > p->opts.prescale_downscaling_threshold)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return passes;
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Upload the NNEDI3 UBO weights only if needed
|
|
|
|
static void upload_nnedi3_weights(struct gl_video *p)
|
2015-10-26 22:43:48 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
GL *gl = p->gl;
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
if (p->opts.nnedi3_opts->upload == NNEDI3_UPLOAD_UBO &&
|
|
|
|
!p->nnedi3_weights_buffer)
|
|
|
|
{
|
|
|
|
gl->GenBuffers(1, &p->nnedi3_weights_buffer);
|
|
|
|
gl->BindBufferBase(GL_UNIFORM_BUFFER, 0, p->nnedi3_weights_buffer);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int size;
|
|
|
|
const float *weights = get_nnedi3_weights(p->opts.nnedi3_opts, &size);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
MP_VERBOSE(p, "Uploading NNEDI3 weights via UBO (size=%d)\n", size);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// We don't know the endianness of GPU, just assume it's LE
|
|
|
|
gl->BufferData(GL_UNIFORM_BUFFER, size, weights, GL_STATIC_DRAW);
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Applies a single pass of the prescaler, and accumulates the offset in
|
|
|
|
// pass_transform.
|
2016-04-16 16:14:32 +00:00
|
|
|
static void pass_prescale_luma_step(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct gl_transform *step_transform,
|
|
|
|
int step)
|
2015-10-26 22:43:48 +00:00
|
|
|
{
|
2016-04-16 16:14:32 +00:00
|
|
|
int id = pass_bind(p, tex);
|
|
|
|
int planes = tex.components;
|
|
|
|
|
|
|
|
switch(p->opts.prescale_luma) {
|
|
|
|
case 1:
|
|
|
|
assert(planes == 1);
|
|
|
|
pass_superxbr(p->sc, id, step, tex.multiplier,
|
|
|
|
p->opts.superxbr_opts, step_transform);
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
upload_nnedi3_weights(p);
|
|
|
|
pass_nnedi3(p->gl, p->sc, planes, id, step, tex.multiplier,
|
|
|
|
p->opts.nnedi3_opts, step_transform, tex.gl_target);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
abort();
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
skip_unused(p, planes);
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// Returns true if two img_texs are semantically equivalent (same metadata)
|
|
|
|
static bool img_tex_equiv(struct img_tex a, struct img_tex b)
|
|
|
|
{
|
|
|
|
return a.type == b.type &&
|
|
|
|
a.components == b.components &&
|
|
|
|
a.multiplier == b.multiplier &&
|
|
|
|
a.gl_target == b.gl_target &&
|
|
|
|
a.use_integer == b.use_integer &&
|
|
|
|
a.tex_w == b.tex_w &&
|
|
|
|
a.tex_h == b.tex_h &&
|
|
|
|
a.w == b.w &&
|
|
|
|
a.h == b.h &&
|
|
|
|
gl_transform_eq(a.transform, b.transform) &&
|
|
|
|
strcmp(a.swizzle, b.swizzle) == 0;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
static void pass_add_hook(struct gl_video *p, struct tex_hook hook)
|
2016-01-26 19:47:32 +00:00
|
|
|
{
|
2016-04-16 16:14:32 +00:00
|
|
|
if (p->tex_hook_num < MAX_TEXTURE_HOOKS) {
|
|
|
|
p->tex_hooks[p->tex_hook_num++] = hook;
|
|
|
|
} else {
|
|
|
|
MP_ERR(p, "Too many hooks! Limit is %d.\n", MAX_TEXTURE_HOOKS);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
2016-04-16 16:14:32 +00:00
|
|
|
}
|
2016-01-26 19:47:32 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// Adds a hook multiple times, one per name. The last name must be NULL to
|
|
|
|
// signal the end of the argument list.
|
|
|
|
#define HOOKS(...) ((const char*[]){__VA_ARGS__, NULL})
|
|
|
|
static void pass_add_hooks(struct gl_video *p, struct tex_hook hook,
|
|
|
|
const char **names)
|
|
|
|
{
|
|
|
|
for (int i = 0; names[i] != NULL; i++) {
|
|
|
|
hook.hook_tex = names[i];
|
|
|
|
pass_add_hook(p, hook);
|
|
|
|
}
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
static void deband_hook(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct gl_transform *trans, void *priv)
|
|
|
|
{
|
|
|
|
// We could use the hook binding mechanism here but the existing code
|
|
|
|
// already assumes we just know an ID so just do this for simplicity
|
|
|
|
int id = pass_bind(p, tex);
|
|
|
|
pass_sample_deband(p->sc, p->opts.deband_opts, id, tex.multiplier,
|
|
|
|
tex.gl_target, &p->lfg);
|
|
|
|
skip_unused(p, tex.components);
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
static void prescale_hook(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct gl_transform *trans, void *priv)
|
|
|
|
{
|
|
|
|
struct gl_transform step_trans = identity_trans;
|
|
|
|
pass_prescale_luma_step(p, tex, &step_trans, 0);
|
|
|
|
gl_transform_trans(step_trans, trans);
|
|
|
|
|
|
|
|
// We render out an FBO *inside* this hook, which is normally quite
|
|
|
|
// unusual but here it allows us to work around the lack of real closures.
|
|
|
|
// Unfortunately it means we need to duplicate some work to compute the
|
|
|
|
// new FBO size
|
|
|
|
struct fbotex *fbo = priv;
|
|
|
|
int w = tex.w * (int)step_trans.m[0][0],
|
|
|
|
h = tex.h * (int)step_trans.m[1][1];
|
|
|
|
finish_pass_fbo(p, fbo, w, h, 0);
|
|
|
|
tex = img_tex_fbo(fbo, tex.type, tex.components);
|
|
|
|
|
|
|
|
pass_prescale_luma_step(p, tex, &step_trans, 1);
|
|
|
|
gl_transform_trans(step_trans, trans);
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
static void unsharp_hook(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct gl_transform *trans, void *priv)
|
|
|
|
{
|
|
|
|
GLSLF("#define tex HOOKED\n");
|
|
|
|
GLSLF("#define pos HOOKED_pos\n");
|
|
|
|
GLSLF("#define pt HOOKED_pt\n");
|
|
|
|
pass_sample_unsharp(p->sc, p->opts.unsharp);
|
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void user_hook_old(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct gl_transform *trans, void *priv)
|
2016-04-19 18:45:40 +00:00
|
|
|
{
|
|
|
|
const char *body = priv;
|
|
|
|
assert(body);
|
|
|
|
|
|
|
|
GLSLHF("#define pixel_size HOOKED_pt\n");
|
2016-04-20 23:33:13 +00:00
|
|
|
load_shader(p, bstr0(body));
|
2016-04-19 18:45:40 +00:00
|
|
|
const char *fn_name = get_custom_shader_fn(p, body);
|
|
|
|
GLSLF("// custom shader\n");
|
|
|
|
GLSLF("color = %s(HOOKED, HOOKED_pos, HOOKED_size);\n", fn_name);
|
|
|
|
}
|
|
|
|
|
2016-05-12 01:34:47 +00:00
|
|
|
// Returns 1.0 on failure to at least create a legal FBO
|
|
|
|
static float eval_szexpr(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct szexp expr[MAX_SZEXP_SIZE])
|
|
|
|
{
|
|
|
|
float stack[MAX_SZEXP_SIZE] = {0};
|
|
|
|
int idx = 0; // points to next element to push
|
|
|
|
|
|
|
|
for (int i = 0; i < MAX_SZEXP_SIZE; i++) {
|
|
|
|
switch (expr[i].tag) {
|
|
|
|
case SZEXP_END:
|
|
|
|
goto done;
|
|
|
|
|
|
|
|
case SZEXP_CONST:
|
|
|
|
// Since our SZEXPs are bound by MAX_SZEXP_SIZE, it should be
|
|
|
|
// impossible to overflow the stack
|
|
|
|
assert(idx < MAX_SZEXP_SIZE);
|
|
|
|
stack[idx++] = expr[i].val.cval;
|
|
|
|
continue;
|
|
|
|
|
|
|
|
case SZEXP_OP2:
|
|
|
|
if (idx < 2) {
|
|
|
|
MP_WARN(p, "Stack underflow in RPN expression!\n");
|
|
|
|
return 1.0;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Pop the operands in reverse order
|
|
|
|
float op2 = stack[--idx], op1 = stack[--idx], res = 0.0;
|
|
|
|
switch (expr[i].val.op) {
|
|
|
|
case SZEXP_OP_ADD: res = op1 + op2; break;
|
|
|
|
case SZEXP_OP_SUB: res = op1 - op2; break;
|
|
|
|
case SZEXP_OP_MUL: res = op1 * op2; break;
|
|
|
|
case SZEXP_OP_DIV: res = op1 / op2; break;
|
|
|
|
default: abort();
|
|
|
|
}
|
|
|
|
|
|
|
|
if (isnan(res)) {
|
|
|
|
MP_WARN(p, "Illegal operation in RPN expression!\n");
|
|
|
|
return 1.0;
|
|
|
|
}
|
|
|
|
|
|
|
|
stack[idx++] = res;
|
|
|
|
continue;
|
|
|
|
|
|
|
|
case SZEXP_VAR_W:
|
|
|
|
case SZEXP_VAR_H: {
|
|
|
|
struct bstr name = expr[i].val.varname;
|
|
|
|
struct img_tex var_tex;
|
|
|
|
|
|
|
|
// HOOKED is a special case
|
|
|
|
if (bstr_equals0(name, "HOOKED")) {
|
|
|
|
var_tex = tex;
|
|
|
|
goto found_tex;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (int o = 0; o < p->saved_tex_num; o++) {
|
|
|
|
if (bstr_equals0(name, p->saved_tex[o].name)) {
|
|
|
|
var_tex = p->saved_tex[o].tex;
|
|
|
|
goto found_tex;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
char *errname = bstrto0(NULL, name);
|
|
|
|
MP_WARN(p, "Texture %s not found in RPN expression!\n", errname);
|
|
|
|
talloc_free(errname);
|
|
|
|
return 1.0;
|
|
|
|
|
|
|
|
found_tex:
|
|
|
|
stack[idx++] = (expr[i].tag == SZEXP_VAR_W) ? var_tex.w : var_tex.h;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
done:
|
|
|
|
// Return the single stack element
|
|
|
|
if (idx != 1) {
|
|
|
|
MP_WARN(p, "Malformed stack after RPN expression!\n");
|
|
|
|
return 1.0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return stack[0];
|
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void user_hook(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct gl_transform *trans, void *priv)
|
|
|
|
{
|
|
|
|
struct gl_user_shader *shader = priv;
|
|
|
|
assert(shader);
|
|
|
|
|
|
|
|
load_shader(p, shader->pass_body);
|
|
|
|
GLSLF("// custom hook\n");
|
|
|
|
GLSLF("color = hook();\n");
|
|
|
|
|
2016-05-12 01:34:47 +00:00
|
|
|
float w = eval_szexpr(p, tex, shader->width);
|
|
|
|
float h = eval_szexpr(p, tex, shader->height);
|
|
|
|
|
|
|
|
*trans = (struct gl_transform){{{w / tex.w, 0}, {0, h / tex.h}}};
|
|
|
|
gl_transform_trans(shader->offset, trans);
|
2016-04-20 23:33:13 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void user_hook_free(struct tex_hook *hook)
|
|
|
|
{
|
|
|
|
talloc_free((void *)hook->hook_tex);
|
|
|
|
talloc_free((void *)hook->save_tex);
|
|
|
|
for (int i = 0; i < TEXUNIT_VIDEO_NUM; i++)
|
|
|
|
talloc_free((void *)hook->bind_tex[i]);
|
|
|
|
talloc_free(hook->priv);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void pass_hook_user_shaders_old(struct gl_video *p, const char *name,
|
|
|
|
char **shaders)
|
2016-04-19 18:45:40 +00:00
|
|
|
{
|
|
|
|
assert(name);
|
|
|
|
if (!shaders)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (int n = 0; shaders[n] != NULL; n++) {
|
2016-04-20 23:33:13 +00:00
|
|
|
const char *body = load_cached_file(p, shaders[n]).start;
|
2016-04-19 18:45:40 +00:00
|
|
|
if (body) {
|
|
|
|
pass_add_hook(p, (struct tex_hook) {
|
|
|
|
.hook_tex = name,
|
|
|
|
.bind_tex = {"HOOKED"},
|
2016-04-20 23:33:13 +00:00
|
|
|
.hook = user_hook_old,
|
2016-04-19 18:45:40 +00:00
|
|
|
.priv = (void *)body,
|
|
|
|
});
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void pass_hook_user_shaders(struct gl_video *p, char **shaders)
|
2016-04-16 16:14:32 +00:00
|
|
|
{
|
2016-04-20 23:33:13 +00:00
|
|
|
if (!shaders)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (int n = 0; shaders[n] != NULL; n++) {
|
|
|
|
struct bstr file = load_cached_file(p, shaders[n]);
|
|
|
|
struct gl_user_shader out;
|
|
|
|
while (parse_user_shader_pass(p->log, &file, &out)) {
|
|
|
|
struct tex_hook hook = {
|
|
|
|
.components = out.components,
|
|
|
|
.hook = user_hook,
|
|
|
|
.free = user_hook_free,
|
|
|
|
};
|
|
|
|
|
|
|
|
for (int i = 0; i < SHADER_MAX_HOOKS; i++) {
|
|
|
|
hook.hook_tex = bstrdup0(p, out.hook_tex[i]);
|
|
|
|
if (!hook.hook_tex)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
struct gl_user_shader *out_copy = talloc_ptrtype(p, out_copy);
|
|
|
|
*out_copy = out;
|
|
|
|
hook.priv = out_copy;
|
|
|
|
for (int o = 0; o < SHADER_MAX_BINDS; o++)
|
|
|
|
hook.bind_tex[o] = bstrdup0(p, out.bind_tex[o]);
|
|
|
|
hook.save_tex = bstrdup0(p, out.save_tex),
|
|
|
|
pass_add_hook(p, hook);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
static void gl_video_setup_hooks(struct gl_video *p)
|
|
|
|
{
|
2016-04-16 16:14:32 +00:00
|
|
|
if (p->opts.deband) {
|
|
|
|
pass_add_hooks(p, (struct tex_hook) {.hook = deband_hook},
|
|
|
|
HOOKS("LUMA", "CHROMA", "RGB", "XYZ"));
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
int prescale_passes = get_prescale_passes(p);
|
|
|
|
for (int i = 0; i < prescale_passes; i++) {
|
|
|
|
pass_add_hook(p, (struct tex_hook) {
|
|
|
|
.hook_tex = "LUMA",
|
|
|
|
.hook = prescale_hook,
|
|
|
|
.priv = &p->prescale_fbo[i],
|
|
|
|
});
|
2016-01-26 19:47:32 +00:00
|
|
|
}
|
2016-04-19 18:45:40 +00:00
|
|
|
|
|
|
|
if (p->opts.unsharp != 0.0) {
|
|
|
|
pass_add_hook(p, (struct tex_hook) {
|
|
|
|
.hook_tex = "MAIN",
|
|
|
|
.bind_tex = {"HOOKED"},
|
|
|
|
.hook = unsharp_hook,
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2016-04-20 23:33:13 +00:00
|
|
|
pass_hook_user_shaders_old(p, "MAIN", p->opts.pre_shaders);
|
|
|
|
pass_hook_user_shaders_old(p, "SCALED", p->opts.post_shaders);
|
|
|
|
pass_hook_user_shaders(p, p->opts.user_shaders);
|
2016-04-16 16:14:32 +00:00
|
|
|
}
|
2016-01-26 19:47:32 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// sample from video textures, set "color" variable to yuv value
|
|
|
|
static void pass_read_video(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct img_tex tex[4];
|
|
|
|
struct gl_transform offsets[4];
|
|
|
|
pass_get_img_tex(p, &p->image, tex, offsets);
|
|
|
|
|
|
|
|
// To keep the code as simple as possibly, we currently run all shader
|
|
|
|
// stages even if they would be unnecessary (e.g. no hooks for a texture).
|
|
|
|
// In the future, deferred img_tex should optimize this away.
|
|
|
|
|
|
|
|
// Merge semantically identical textures. This loop is done from back
|
|
|
|
// to front so that merged textures end up in the right order while
|
|
|
|
// simultaneously allowing us to skip unnecessary merges
|
|
|
|
for (int n = 3; n >= 0; n--) {
|
|
|
|
if (tex[n].type == PLANE_NONE)
|
|
|
|
continue;
|
2016-01-26 19:47:32 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
int first = n;
|
|
|
|
int num = 0;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
for (int i = 0; i < n; i++) {
|
|
|
|
if (img_tex_equiv(tex[n], tex[i]) &&
|
|
|
|
gl_transform_eq(offsets[n], offsets[i]))
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
{
|
2016-04-16 16:14:32 +00:00
|
|
|
GLSLF("// merging plane %d ...\n", i);
|
|
|
|
copy_img_tex(p, &num, tex[i]);
|
|
|
|
first = MPMIN(first, i);
|
|
|
|
memset(&tex[i], 0, sizeof(tex[i]));
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
2015-09-05 15:39:27 +00:00
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
if (num > 0) {
|
|
|
|
GLSLF("// merging plane %d ... into %d\n", n, first);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
copy_img_tex(p, &num, tex[n]);
|
|
|
|
finish_pass_fbo(p, &p->merge_fbo[n], tex[n].w, tex[n].h, 0);
|
2016-04-16 16:14:32 +00:00
|
|
|
tex[first] = img_tex_fbo(&p->merge_fbo[n], tex[n].type, num);
|
|
|
|
memset(&tex[n], 0, sizeof(tex[n]));
|
2015-09-05 15:39:27 +00:00
|
|
|
}
|
2016-04-16 16:14:32 +00:00
|
|
|
}
|
2015-04-12 01:34:38 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// If any textures are still in integer format by this point, we need
|
|
|
|
// to introduce an explicit conversion pass to avoid breaking hooks/scaling
|
|
|
|
for (int n = 0; n < 4; n++) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
if (tex[n].use_integer) {
|
|
|
|
GLSLF("// use_integer fix for plane %d\n", n);
|
|
|
|
|
|
|
|
copy_img_tex(p, &(int){0}, tex[n]);
|
|
|
|
finish_pass_fbo(p, &p->integer_fbo[n], tex[n].w, tex[n].h, 0);
|
2016-04-16 16:14:32 +00:00
|
|
|
tex[n] = img_tex_fbo(&p->integer_fbo[n], tex[n].type,
|
|
|
|
tex[n].components);
|
2015-03-27 12:27:40 +00:00
|
|
|
}
|
2016-04-16 16:14:32 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// Dispatch the hooks for all of these textures, saving and perhaps
|
|
|
|
// modifying them in the process
|
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
const char *name;
|
|
|
|
switch (tex[n].type) {
|
|
|
|
case PLANE_RGB: name = "RGB"; break;
|
|
|
|
case PLANE_LUMA: name = "LUMA"; break;
|
|
|
|
case PLANE_CHROMA: name = "CHROMA"; break;
|
|
|
|
case PLANE_ALPHA: name = "ALPHA"; break;
|
|
|
|
case PLANE_XYZ: name = "XYZ"; break;
|
|
|
|
default: continue;
|
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
tex[n] = pass_hook(p, name, tex[n], &offsets[n]);
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// At this point all planes are finalized but they may not be at the
|
|
|
|
// required size yet. Furthermore, they may have texture offsets that
|
|
|
|
// require realignment. For lack of something better to do, we assume
|
|
|
|
// the rgb/luma texture is the "reference" and scale everything else
|
|
|
|
// to match.
|
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
switch (tex[n].type) {
|
|
|
|
case PLANE_RGB:
|
|
|
|
case PLANE_XYZ:
|
|
|
|
case PLANE_LUMA: break;
|
|
|
|
default: continue;
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
p->texture_w = tex[n].w;
|
|
|
|
p->texture_h = tex[n].h;
|
|
|
|
p->texture_offset = offsets[n];
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Compute the reference rect
|
|
|
|
struct mp_rect_f src = {0.0, 0.0, p->image_params.w, p->image_params.h};
|
|
|
|
struct mp_rect_f ref = src;
|
|
|
|
gl_transform_rect(p->texture_offset, &ref);
|
|
|
|
MP_DBG(p, "ref rect: {%f %f} {%f %f}\n", ref.x0, ref.y0, ref.x1, ref.y1);
|
|
|
|
|
|
|
|
// Explicitly scale all of the textures that don't match
|
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
if (tex[n].type == PLANE_NONE)
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
continue;
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
// If the planes are aligned identically, we will end up with the
|
|
|
|
// exact same source rectangle.
|
|
|
|
struct mp_rect_f rect = src;
|
|
|
|
gl_transform_rect(offsets[n], &rect);
|
|
|
|
MP_DBG(p, "rect[%d]: {%f %f} {%f %f}\n", n,
|
|
|
|
rect.x0, rect.y0, rect.x1, rect.y1);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
if (mp_rect_f_seq(ref, rect))
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
continue;
|
2016-04-16 16:14:32 +00:00
|
|
|
|
|
|
|
// If the rectangles differ, then our planes have a different
|
|
|
|
// alignment and/or size. First of all, we have to compute the
|
|
|
|
// corrections required to meet the target rectangle
|
|
|
|
struct gl_transform fix = {
|
|
|
|
.m = {{(ref.x1 - ref.x0) / (rect.x1 - rect.x0), 0.0},
|
|
|
|
{0.0, (ref.y1 - ref.y0) / (rect.y1 - rect.y0)}},
|
|
|
|
.t = {ref.x0, ref.y0},
|
|
|
|
};
|
|
|
|
|
|
|
|
// Since the scale in texture space is different from the scale in
|
|
|
|
// absolute terms, we have to scale the coefficients down to be
|
|
|
|
// relative to the texture's physical dimensions and local offset
|
|
|
|
struct gl_transform scale = {
|
|
|
|
.m = {{(float)tex[n].w / p->texture_w, 0.0},
|
|
|
|
{0.0, (float)tex[n].h / p->texture_h}},
|
|
|
|
.t = {-rect.x0, -rect.y0},
|
|
|
|
};
|
|
|
|
gl_transform_trans(scale, &fix);
|
|
|
|
MP_DBG(p, "-> fix[%d] = {%f %f} + off {%f %f}\n", n,
|
|
|
|
fix.m[0][0], fix.m[1][1], fix.t[0], fix.t[1]);
|
|
|
|
|
|
|
|
// Since the texture transform is a function of the texture coordinates
|
|
|
|
// to texture space, rather than the other way around, we have to
|
|
|
|
// actually apply the *inverse* of this. Fortunately, calculating
|
|
|
|
// the inverse is relatively easy here.
|
|
|
|
fix.m[0][0] = 1.0 / fix.m[0][0];
|
|
|
|
fix.m[1][1] = 1.0 / fix.m[1][1];
|
|
|
|
fix.t[0] = fix.m[0][0] * -fix.t[0];
|
|
|
|
fix.t[1] = fix.m[1][1] * -fix.t[1];
|
|
|
|
gl_transform_trans(fix, &tex[n].transform);
|
|
|
|
|
|
|
|
int scaler_id = -1;
|
|
|
|
const char *name = NULL;
|
|
|
|
switch (tex[n].type) {
|
|
|
|
case PLANE_RGB:
|
|
|
|
case PLANE_LUMA:
|
|
|
|
case PLANE_XYZ:
|
|
|
|
scaler_id = SCALER_SCALE;
|
|
|
|
// these aren't worth hooking, fringe hypothetical cases only
|
|
|
|
break;
|
|
|
|
case PLANE_CHROMA:
|
|
|
|
scaler_id = SCALER_CSCALE;
|
|
|
|
name = "CHROMA_SCALED";
|
|
|
|
break;
|
|
|
|
case PLANE_ALPHA:
|
|
|
|
// alpha always uses bilinear
|
|
|
|
name = "ALPHA_SCALED";
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
2016-04-16 16:14:32 +00:00
|
|
|
if (scaler_id < 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
const struct scaler_config *conf = &p->opts.scaler[scaler_id];
|
|
|
|
struct scaler *scaler = &p->scaler[scaler_id];
|
|
|
|
|
|
|
|
// bilinear scaling is a free no-op thanks to GPU sampling
|
|
|
|
if (strcmp(conf->kernel.name, "bilinear") != 0) {
|
|
|
|
GLSLF("// upscaling plane %d\n", n);
|
|
|
|
pass_sample(p, tex[n], scaler, conf, 1.0, p->texture_w, p->texture_h);
|
|
|
|
finish_pass_fbo(p, &p->scale_fbo[n], p->texture_w, p->texture_h,
|
|
|
|
FBOTEX_FUZZY);
|
|
|
|
tex[n] = img_tex_fbo(&p->scale_fbo[n], tex[n].type, tex[n].components);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Run any post-scaling hooks
|
|
|
|
tex[n] = pass_hook(p, name, tex[n], NULL);
|
2015-03-27 12:27:40 +00:00
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// All planes are of the same size and properly aligned at this point
|
|
|
|
GLSLF("// combining planes\n");
|
|
|
|
int coord = 0;
|
|
|
|
for (int i = 0; i < 4; i++) {
|
|
|
|
if (tex[i].type != PLANE_NONE)
|
|
|
|
copy_img_tex(p, &coord, tex[i]);
|
|
|
|
}
|
2016-03-05 11:38:51 +00:00
|
|
|
p->components = coord;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Utility function that simply binds an FBO and reads from it, without any
|
|
|
|
// transformations. Returns the ID of the texture unit it was bound to
|
|
|
|
static int pass_read_fbo(struct gl_video *p, struct fbotex *fbo)
|
|
|
|
{
|
2016-04-16 16:14:32 +00:00
|
|
|
struct img_tex tex = img_tex_fbo(fbo, PLANE_RGB, p->components);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
copy_img_tex(p, &(int){0}, tex);
|
|
|
|
|
|
|
|
return pass_bind(p, tex);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// yuv conversion, and any other conversions before main up/down-scaling
|
|
|
|
static void pass_convert_yuv(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct gl_shader_cache *sc = p->sc;
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
struct mp_csp_params cparams = MP_CSP_PARAMS_DEFAULTS;
|
|
|
|
cparams.gray = p->is_yuv && !p->is_packed_yuv && p->plane_count == 1;
|
|
|
|
cparams.input_bits = p->image_desc.component_bits;
|
2015-12-07 22:41:29 +00:00
|
|
|
cparams.texture_bits = p->image_desc.component_full_bits;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
mp_csp_set_image_params(&cparams, &p->image_params);
|
|
|
|
mp_csp_copy_equalizer_values(&cparams, &p->video_eq);
|
2015-03-23 01:42:19 +00:00
|
|
|
p->user_gamma = 1.0 / (cparams.gamma * p->opts.gamma);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// color conversion\n");
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (p->color_swizzle[0])
|
|
|
|
GLSLF("color = color.%s;\n", p->color_swizzle);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Pre-colormatrix input gamma correction
|
2015-12-09 16:10:38 +00:00
|
|
|
if (cparams.colorspace == MP_CSP_XYZ)
|
|
|
|
GLSL(color.rgb = pow(color.rgb, vec3(2.6));) // linear light
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// We always explicitly normalize the range in pass_read_video
|
|
|
|
cparams.input_bits = cparams.texture_bits = 0;
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2015-12-07 22:45:41 +00:00
|
|
|
// Conversion to RGB. For RGB itself, this still applies e.g. brightness
|
|
|
|
// and contrast controls, or expansion of e.g. LSB-packed 10 bit data.
|
|
|
|
struct mp_cmat m = {{{0}}};
|
2015-12-08 23:22:12 +00:00
|
|
|
mp_get_csp_matrix(&cparams, &m);
|
2015-12-07 22:45:41 +00:00
|
|
|
gl_sc_uniform_mat3(sc, "colormatrix", true, &m.m[0][0]);
|
|
|
|
gl_sc_uniform_vec3(sc, "colormatrix_c", m.c);
|
|
|
|
|
|
|
|
GLSL(color.rgb = mat3(colormatrix) * color.rgb + colormatrix_c;)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
|
|
|
if (p->image_params.colorspace == MP_CSP_BT_2020_C) {
|
|
|
|
// Conversion for C'rcY'cC'bc via the BT.2020 CL system:
|
|
|
|
// C'bc = (B'-Y'c) / 1.9404 | C'bc <= 0
|
|
|
|
// = (B'-Y'c) / 1.5816 | C'bc > 0
|
|
|
|
//
|
|
|
|
// C'rc = (R'-Y'c) / 1.7184 | C'rc <= 0
|
|
|
|
// = (R'-Y'c) / 0.9936 | C'rc > 0
|
|
|
|
//
|
|
|
|
// as per the BT.2020 specification, table 4. This is a non-linear
|
|
|
|
// transformation because (constant) luminance receives non-equal
|
|
|
|
// contributions from the three different channels.
|
|
|
|
GLSLF("// constant luminance conversion\n");
|
|
|
|
GLSL(color.br = color.br * mix(vec2(1.5816, 0.9936),
|
|
|
|
vec2(1.9404, 1.7184),
|
|
|
|
lessThanEqual(color.br, vec2(0)))
|
|
|
|
+ color.gg;)
|
|
|
|
// Expand channels to camera-linear light. This shader currently just
|
|
|
|
// assumes everything uses the BT.2020 12-bit gamma function, since the
|
|
|
|
// difference between 10 and 12-bit is negligible for anything other
|
|
|
|
// than 12-bit content.
|
|
|
|
GLSL(color.rgb = mix(color.rgb / vec3(4.5),
|
|
|
|
pow((color.rgb + vec3(0.0993))/vec3(1.0993), vec3(1.0/0.45)),
|
|
|
|
lessThanEqual(vec3(0.08145), color.rgb));)
|
|
|
|
// Calculate the green channel from the expanded RYcB
|
|
|
|
// The BT.2020 specification says Yc = 0.2627*R + 0.6780*G + 0.0593*B
|
|
|
|
GLSL(color.g = (color.g - 0.2627*color.r - 0.0593*color.b)/0.6780;)
|
2015-03-14 02:04:23 +00:00
|
|
|
// Recompress to receive the R'G'B' result, same as other systems
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSL(color.rgb = mix(color.rgb * vec3(4.5),
|
|
|
|
vec3(1.0993) * pow(color.rgb, vec3(0.45)) - vec3(0.0993),
|
|
|
|
lessThanEqual(vec3(0.0181), color.rgb));)
|
|
|
|
}
|
|
|
|
|
2016-03-05 11:38:51 +00:00
|
|
|
p->components = 3;
|
2015-03-13 20:29:04 +00:00
|
|
|
if (!p->has_alpha || p->opts.alpha_mode == 0) { // none
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSL(color.a = 1.0;)
|
2015-12-22 22:14:47 +00:00
|
|
|
} else if (p->opts.alpha_mode == 2) { // blend against black
|
2015-03-13 20:29:04 +00:00
|
|
|
GLSL(color = vec4(color.rgb * color.a, 1.0);)
|
2016-03-05 11:38:51 +00:00
|
|
|
} else { // alpha present in image
|
|
|
|
p->components = 4;
|
2016-03-29 19:56:38 +00:00
|
|
|
GLSL(color = vec4(color.rgb * color.a, color.a);)
|
2015-03-13 20:29:04 +00:00
|
|
|
}
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
}
|
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
static void get_scale_factors(struct gl_video *p, bool transpose_rot, double xy[2])
|
2014-11-23 19:06:05 +00:00
|
|
|
{
|
2016-03-28 14:30:48 +00:00
|
|
|
double target_w = p->src_rect.x1 - p->src_rect.x0;
|
|
|
|
double target_h = p->src_rect.y1 - p->src_rect.y0;
|
2016-04-08 20:21:31 +00:00
|
|
|
if (transpose_rot && p->image_params.rotate % 180 == 90)
|
2016-03-28 14:30:48 +00:00
|
|
|
MPSWAP(double, target_w, target_h);
|
|
|
|
xy[0] = (p->dst_rect.x1 - p->dst_rect.x0) / target_w;
|
|
|
|
xy[1] = (p->dst_rect.y1 - p->dst_rect.y0) / target_h;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
// Cropping.
|
2016-03-28 14:30:48 +00:00
|
|
|
static void compute_src_transform(struct gl_video *p, struct gl_transform *tr)
|
2015-09-07 19:02:49 +00:00
|
|
|
{
|
2015-10-23 17:52:03 +00:00
|
|
|
float sx = (p->src_rect.x1 - p->src_rect.x0) / (float)p->texture_w,
|
|
|
|
sy = (p->src_rect.y1 - p->src_rect.y0) / (float)p->texture_h,
|
2015-09-07 19:02:49 +00:00
|
|
|
ox = p->src_rect.x0,
|
|
|
|
oy = p->src_rect.y0;
|
2016-03-28 14:30:48 +00:00
|
|
|
struct gl_transform transform = {{{sx, 0}, {0, sy}}, {ox, oy}};
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
gl_transform_trans(p->texture_offset, &transform);
|
|
|
|
|
2015-09-07 19:02:49 +00:00
|
|
|
*tr = transform;
|
|
|
|
}
|
|
|
|
|
2015-03-15 21:52:34 +00:00
|
|
|
// Takes care of the main scaling and pre/post-conversions
|
|
|
|
static void pass_scale_main(struct gl_video *p)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
// Figure out the main scaler.
|
|
|
|
double xy[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, true, xy);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
|
|
|
// actual scale factor should be divided by the scale factor of prescaling.
|
|
|
|
xy[0] /= p->texture_offset.m[0][0];
|
|
|
|
xy[1] /= p->texture_offset.m[1][1];
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
bool downscaling = xy[0] < 1.0 || xy[1] < 1.0;
|
|
|
|
bool upscaling = !downscaling && (xy[0] > 1.0 || xy[1] > 1.0);
|
|
|
|
double scale_factor = 1.0;
|
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
struct scaler *scaler = &p->scaler[SCALER_SCALE];
|
|
|
|
struct scaler_config scaler_conf = p->opts.scaler[SCALER_SCALE];
|
2015-10-26 22:43:48 +00:00
|
|
|
if (p->opts.scaler_resizes_only && !downscaling && !upscaling) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
scaler_conf.kernel.name = "bilinear";
|
2016-04-17 11:07:14 +00:00
|
|
|
// For scaler-resizes-only, we round the texture offset to
|
|
|
|
// the nearest round value in order to prevent ugly blurriness
|
|
|
|
// (in exchange for slightly shifting the image by up to half a
|
|
|
|
// subpixel)
|
|
|
|
p->texture_offset.t[0] = roundf(p->texture_offset.t[0]);
|
|
|
|
p->texture_offset.t[1] = roundf(p->texture_offset.t[1]);
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
2016-03-05 08:42:57 +00:00
|
|
|
if (downscaling && p->opts.scaler[SCALER_DSCALE].kernel.name) {
|
|
|
|
scaler_conf = p->opts.scaler[SCALER_DSCALE];
|
|
|
|
scaler = &p->scaler[SCALER_DSCALE];
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-11-07 16:49:14 +00:00
|
|
|
// When requesting correct-downscaling and the clip is anamorphic, and
|
|
|
|
// because only a single scale factor is used for both axes, enable it only
|
2015-08-10 00:57:53 +00:00
|
|
|
// when both axes are downscaled, and use the milder of the factors to not
|
|
|
|
// end up with too much blur on one axis (even if we end up with sub-optimal
|
2015-11-07 16:49:14 +00:00
|
|
|
// scale factor on the other axis). This is better than not respecting
|
|
|
|
// correct scaling at all for anamorphic clips.
|
2015-08-10 00:57:53 +00:00
|
|
|
double f = MPMAX(xy[0], xy[1]);
|
2015-11-07 16:49:14 +00:00
|
|
|
if (p->opts.correct_downscaling && f < 1.0)
|
2015-08-10 00:57:53 +00:00
|
|
|
scale_factor = 1.0 / f;
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Pre-conversion, like linear light/sigmoidization
|
|
|
|
GLSLF("// scaler pre-conversion\n");
|
2016-04-19 18:45:40 +00:00
|
|
|
if (p->use_linear) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_linearize(p->sc, p->image_params.gamma);
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "LINEAR", NULL);
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-03-15 21:52:34 +00:00
|
|
|
bool use_sigmoid = p->use_linear && p->opts.sigmoid_upscaling && upscaling;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
float sig_center, sig_slope, sig_offset, sig_scale;
|
|
|
|
if (use_sigmoid) {
|
|
|
|
// Coefficients for the sigmoidal transform are taken from the
|
|
|
|
// formula here: http://www.imagemagick.org/Usage/color_mods/#sigmoidal
|
|
|
|
sig_center = p->opts.sigmoid_center;
|
|
|
|
sig_slope = p->opts.sigmoid_slope;
|
|
|
|
// This function needs to go through (0,0) and (1,1) so we compute the
|
|
|
|
// values at 1 and 0, and then scale/shift them, respectively.
|
|
|
|
sig_offset = 1.0/(1+expf(sig_slope * sig_center));
|
|
|
|
sig_scale = 1.0/(1+expf(sig_slope * (sig_center-1))) - sig_offset;
|
|
|
|
GLSLF("color.rgb = %f - log(1.0/(color.rgb * %f + %f) - 1.0)/%f;\n",
|
|
|
|
sig_center, sig_scale, sig_offset, sig_slope);
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "SIGMOID", NULL);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "PREKERNEL", NULL);
|
|
|
|
|
2016-03-28 14:30:48 +00:00
|
|
|
int vp_w = p->dst_rect.x1 - p->dst_rect.x0;
|
|
|
|
int vp_h = p->dst_rect.y1 - p->dst_rect.y0;
|
2015-09-07 19:02:49 +00:00
|
|
|
struct gl_transform transform;
|
2016-03-28 14:30:48 +00:00
|
|
|
compute_src_transform(p, &transform);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// main scaling\n");
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &p->indirect_fbo, p->texture_w, p->texture_h, 0);
|
2016-04-16 16:14:32 +00:00
|
|
|
struct img_tex src = img_tex_fbo(&p->indirect_fbo, PLANE_RGB, p->components);
|
|
|
|
gl_transform_trans(transform, &src.transform);
|
2016-03-05 11:38:51 +00:00
|
|
|
pass_sample(p, src, scaler, &scaler_conf, scale_factor, vp_w, vp_h);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-10-23 17:52:03 +00:00
|
|
|
// Changes the texture size to display size after main scaler.
|
|
|
|
p->texture_w = vp_w;
|
|
|
|
p->texture_h = vp_h;
|
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "POSTKERNEL", NULL);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSLF("// scaler post-conversion\n");
|
|
|
|
if (use_sigmoid) {
|
|
|
|
// Inverse of the transformation above
|
|
|
|
GLSLF("color.rgb = (1.0/(1.0 + exp(%f * (%f - color.rgb))) - %f) / %f;\n",
|
|
|
|
sig_slope, sig_center, sig_offset, sig_scale);
|
|
|
|
}
|
2015-03-15 21:52:34 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-03-27 10:12:46 +00:00
|
|
|
// Adapts the colors from the given color space to the display device's native
|
|
|
|
// gamut.
|
|
|
|
static void pass_colormanage(struct gl_video *p, enum mp_csp_prim prim_src,
|
|
|
|
enum mp_csp_trc trc_src)
|
2015-03-15 21:52:34 +00:00
|
|
|
{
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSLF("// color management\n");
|
|
|
|
enum mp_csp_trc trc_dst = p->opts.target_trc;
|
2015-03-27 10:12:46 +00:00
|
|
|
enum mp_csp_prim prim_dst = p->opts.target_prim;
|
2015-03-23 01:42:19 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
if (p->use_lut_3d) {
|
2016-02-13 14:33:00 +00:00
|
|
|
// The 3DLUT is always generated against the original source space
|
|
|
|
enum mp_csp_prim prim_orig = p->image_params.primaries;
|
|
|
|
enum mp_csp_trc trc_orig = p->image_params.gamma;
|
|
|
|
|
|
|
|
if (gl_video_get_lut3d(p, prim_orig, trc_orig)) {
|
|
|
|
prim_dst = prim_orig;
|
|
|
|
trc_dst = trc_orig;
|
|
|
|
} else {
|
|
|
|
p->use_lut_3d = false;
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (prim_dst == MP_CSP_PRIM_AUTO)
|
|
|
|
prim_dst = prim_src;
|
|
|
|
if (trc_dst == MP_CSP_TRC_AUTO) {
|
2015-03-27 10:12:46 +00:00
|
|
|
trc_dst = trc_src;
|
|
|
|
// Avoid outputting linear light at all costs
|
|
|
|
if (trc_dst == MP_CSP_TRC_LINEAR)
|
|
|
|
trc_dst = p->image_params.gamma;
|
|
|
|
if (trc_dst == MP_CSP_TRC_LINEAR)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
trc_dst = MP_CSP_TRC_GAMMA22;
|
|
|
|
}
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
bool need_gamma = trc_src != trc_dst || prim_src != prim_dst;
|
2015-03-29 04:34:34 +00:00
|
|
|
if (need_gamma)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_linearize(p->sc, trc_src);
|
2016-02-13 14:33:00 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Adapt to the right colorspace if necessary
|
|
|
|
if (prim_src != prim_dst) {
|
|
|
|
struct mp_csp_primaries csp_src = mp_get_csp_primaries(prim_src),
|
|
|
|
csp_dst = mp_get_csp_primaries(prim_dst);
|
|
|
|
float m[3][3] = {{0}};
|
|
|
|
mp_get_cms_matrix(csp_src, csp_dst, MP_INTENT_RELATIVE_COLORIMETRIC, m);
|
|
|
|
gl_sc_uniform_mat3(p->sc, "cms_matrix", true, &m[0][0]);
|
|
|
|
GLSL(color.rgb = cms_matrix * color.rgb;)
|
|
|
|
}
|
2016-02-13 14:33:00 +00:00
|
|
|
|
|
|
|
if (need_gamma)
|
|
|
|
pass_delinearize(p->sc, trc_dst);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
if (p->use_lut_3d) {
|
|
|
|
gl_sc_uniform_sampler(p->sc, "lut_3d", GL_TEXTURE_3D, TEXUNIT_3DLUT);
|
|
|
|
GLSL(color.rgb = texture3D(lut_3d, color.rgb).rgb;)
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
2014-11-23 19:06:05 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void pass_dither(struct gl_video *p)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
// Assume 8 bits per component if unknown.
|
2015-12-19 10:56:19 +00:00
|
|
|
int dst_depth = gl->fb_g ? gl->fb_g : 8;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (p->opts.dither_depth > 0)
|
|
|
|
dst_depth = p->opts.dither_depth;
|
|
|
|
|
|
|
|
if (p->opts.dither_depth < 0 || p->opts.dither_algo < 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!p->dither_texture) {
|
|
|
|
MP_VERBOSE(p, "Dither to %d.\n", dst_depth);
|
|
|
|
|
|
|
|
int tex_size;
|
|
|
|
void *tex_data;
|
2016-05-12 18:08:49 +00:00
|
|
|
GLint tex_iformat = 0;
|
|
|
|
GLint tex_format = 0;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLenum tex_type;
|
|
|
|
unsigned char temp[256];
|
|
|
|
|
|
|
|
if (p->opts.dither_algo == 0) {
|
|
|
|
int sizeb = p->opts.dither_size;
|
|
|
|
int size = 1 << sizeb;
|
|
|
|
|
|
|
|
if (p->last_dither_matrix_size != size) {
|
|
|
|
p->last_dither_matrix = talloc_realloc(p, p->last_dither_matrix,
|
|
|
|
float, size * size);
|
|
|
|
mp_make_fruit_dither_matrix(p->last_dither_matrix, sizeb);
|
|
|
|
p->last_dither_matrix_size = size;
|
|
|
|
}
|
|
|
|
|
2015-12-08 22:22:08 +00:00
|
|
|
// Prefer R16 texture since they provide higher precision.
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_unorm_format(gl, 2, 1);
|
|
|
|
if (!fmt || gl->es)
|
|
|
|
fmt = gl_find_float16_format(gl, 1);
|
|
|
|
tex_size = size;
|
|
|
|
if (fmt) {
|
2015-12-08 22:22:08 +00:00
|
|
|
tex_iformat = fmt->internal_format;
|
|
|
|
tex_format = fmt->format;
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
tex_type = GL_FLOAT;
|
|
|
|
tex_data = p->last_dither_matrix;
|
|
|
|
} else {
|
|
|
|
assert(sizeof(temp) >= 8 * 8);
|
|
|
|
mp_make_ordered_dither_matrix(temp, 8);
|
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_unorm_format(gl, 1, 1);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
tex_size = 8;
|
|
|
|
tex_iformat = fmt->internal_format;
|
|
|
|
tex_format = fmt->format;
|
|
|
|
tex_type = fmt->type;
|
|
|
|
tex_data = temp;
|
|
|
|
}
|
|
|
|
|
|
|
|
p->dither_size = tex_size;
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + TEXUNIT_DITHER);
|
|
|
|
gl->GenTextures(1, &p->dither_texture);
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, p->dither_texture);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 1);
|
|
|
|
gl->TexImage2D(GL_TEXTURE_2D, 0, tex_iformat, tex_size, tex_size, 0,
|
2016-05-12 18:08:49 +00:00
|
|
|
tex_format, tex_type, tex_data);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 4);
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2015-02-19 13:03:18 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
debug_check_gl(p, "dither setup");
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// dithering\n");
|
|
|
|
|
|
|
|
// This defines how many bits are considered significant for output on
|
|
|
|
// screen. The superfluous bits will be used for rounding according to the
|
|
|
|
// dither matrix. The precision of the source implicitly decides how many
|
|
|
|
// dither patterns can be visible.
|
2015-03-16 19:22:09 +00:00
|
|
|
int dither_quantization = (1 << dst_depth) - 1;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
gl_sc_uniform_sampler(p->sc, "dither", GL_TEXTURE_2D, TEXUNIT_DITHER);
|
|
|
|
|
2015-11-19 13:41:49 +00:00
|
|
|
GLSLF("vec2 dither_pos = gl_FragCoord.xy / %d.0;\n", p->dither_size);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
if (p->opts.temporal_dither) {
|
2015-07-20 17:09:22 +00:00
|
|
|
int phase = (p->frames_rendered / p->opts.temporal_dither_period) % 8u;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
float r = phase * (M_PI / 2); // rotate
|
|
|
|
float m = phase < 4 ? 1 : -1; // mirror
|
|
|
|
|
|
|
|
float matrix[2][2] = {{cos(r), -sin(r) },
|
|
|
|
{sin(r) * m, cos(r) * m}};
|
|
|
|
gl_sc_uniform_mat2(p->sc, "dither_trafo", true, &matrix[0][0]);
|
|
|
|
|
|
|
|
GLSL(dither_pos = dither_trafo * dither_pos;)
|
2015-02-19 13:03:18 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSL(float dither_value = texture(dither, dither_pos).r;)
|
2015-11-19 13:41:49 +00:00
|
|
|
GLSLF("color = floor(color * %d.0 + dither_value + 0.5 / %d.0) / %d.0;\n",
|
|
|
|
dither_quantization, p->dither_size * p->dither_size,
|
2015-03-16 19:22:09 +00:00
|
|
|
dither_quantization);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2015-03-29 04:34:34 +00:00
|
|
|
// Draws the OSD, in scene-referred colors.. If cms is true, subtitles are
|
|
|
|
// instead adapted to the display's gamut.
|
2015-03-23 01:42:19 +00:00
|
|
|
static void pass_draw_osd(struct gl_video *p, int draw_flags, double pts,
|
|
|
|
struct mp_osd_res rect, int vp_w, int vp_h, int fbo,
|
2015-03-29 04:34:34 +00:00
|
|
|
bool cms)
|
2015-03-23 01:42:19 +00:00
|
|
|
{
|
|
|
|
mpgl_osd_generate(p->osd, rect, pts, p->image_params.stereo_out, draw_flags);
|
|
|
|
|
|
|
|
p->gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
for (int n = 0; n < MAX_OSD_PARTS; n++) {
|
|
|
|
enum sub_bitmap_format fmt = mpgl_osd_get_part_format(p->osd, n);
|
|
|
|
if (!fmt)
|
|
|
|
continue;
|
|
|
|
gl_sc_uniform_sampler(p->sc, "osdtex", GL_TEXTURE_2D, 0);
|
|
|
|
switch (fmt) {
|
|
|
|
case SUBBITMAP_RGBA: {
|
|
|
|
GLSLF("// OSD (RGBA)\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = texture(osdtex, texcoord).bgra;)
|
2015-03-23 01:42:19 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
case SUBBITMAP_LIBASS: {
|
|
|
|
GLSLF("// OSD (libass)\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color =
|
2015-03-23 01:42:19 +00:00
|
|
|
vec4(ass_color.rgb, ass_color.a * texture(osdtex, texcoord).r);)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
abort();
|
|
|
|
}
|
2015-03-29 04:34:34 +00:00
|
|
|
// Subtitle color management, they're assumed to be sRGB by default
|
|
|
|
if (cms)
|
2015-03-27 10:12:46 +00:00
|
|
|
pass_colormanage(p, MP_CSP_PRIM_BT_709, MP_CSP_TRC_SRGB);
|
2015-03-23 01:42:19 +00:00
|
|
|
gl_sc_set_vao(p->sc, mpgl_osd_get_vao(p->osd));
|
|
|
|
gl_sc_gen_shader_and_reset(p->sc);
|
|
|
|
mpgl_osd_draw_part(p->osd, vp_w, vp_h, n);
|
|
|
|
}
|
|
|
|
gl_sc_set_vao(p->sc, &p->vao);
|
|
|
|
}
|
|
|
|
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
// Minimal rendering code path, for GLES or OpenGL 2.1 without proper FBOs.
|
|
|
|
static void pass_render_frame_dumb(struct gl_video *p, int fbo)
|
|
|
|
{
|
|
|
|
p->gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex tex[4];
|
2016-04-16 16:14:32 +00:00
|
|
|
struct gl_transform off[4];
|
|
|
|
pass_get_img_tex(p, &p->image, tex, off);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
|
|
|
|
struct gl_transform transform;
|
2016-03-28 14:30:48 +00:00
|
|
|
compute_src_transform(p, &transform);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int index = 0;
|
|
|
|
for (int i = 0; i < p->plane_count; i++) {
|
2016-04-16 16:14:32 +00:00
|
|
|
struct gl_transform trel = {{{(float)p->texture_w / tex[i].w, 0.0},
|
|
|
|
{0.0, (float)p->texture_h / tex[i].h}}};
|
|
|
|
gl_transform_trans(trel, &tex[i].transform);
|
|
|
|
gl_transform_trans(transform, &tex[i].transform);
|
|
|
|
gl_transform_trans(off[i], &tex[i].transform);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
copy_img_tex(p, &index, tex[i]);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
pass_convert_yuv(p);
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// The main rendering function, takes care of everything up to and including
|
2015-07-17 21:21:04 +00:00
|
|
|
// upscaling. p->image is rendered.
|
|
|
|
static void pass_render_frame(struct gl_video *p)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
2015-10-23 17:52:03 +00:00
|
|
|
// initialize the texture parameters
|
|
|
|
p->texture_w = p->image_params.w;
|
|
|
|
p->texture_h = p->image_params.h;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
p->texture_offset = identity_trans;
|
2016-03-05 11:38:51 +00:00
|
|
|
p->components = 0;
|
2016-04-16 16:14:32 +00:00
|
|
|
p->saved_tex_num = 0;
|
2016-04-19 18:45:40 +00:00
|
|
|
p->hook_fbo_num = 0;
|
2015-10-23 17:52:03 +00:00
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
if (p->image_params.rotate % 180 == 90)
|
|
|
|
MPSWAP(int, p->texture_w, p->texture_h);
|
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
if (p->dumb_mode)
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
return;
|
|
|
|
|
2015-04-02 09:13:51 +00:00
|
|
|
p->use_linear = p->opts.linear_scaling || p->opts.sigmoid_upscaling;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
pass_read_video(p);
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "NATIVE", &p->texture_offset);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
pass_convert_yuv(p);
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "MAINPRESUB", &p->texture_offset);
|
2015-04-10 20:19:44 +00:00
|
|
|
|
|
|
|
// For subtitles
|
|
|
|
double vpts = p->image.mpi->pts;
|
|
|
|
if (vpts == MP_NOPTS_VALUE)
|
|
|
|
vpts = p->osd_pts;
|
|
|
|
|
2015-04-11 17:16:34 +00:00
|
|
|
if (p->osd && p->opts.blend_subs == 2) {
|
2015-04-11 13:53:00 +00:00
|
|
|
double scale[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, false, scale);
|
2015-04-11 13:53:00 +00:00
|
|
|
struct mp_osd_res rect = {
|
2015-10-23 17:52:03 +00:00
|
|
|
.w = p->texture_w, .h = p->texture_h,
|
2015-04-11 13:53:00 +00:00
|
|
|
.display_par = scale[1] / scale[0], // counter compensate scaling
|
|
|
|
};
|
2016-04-05 18:58:22 +00:00
|
|
|
finish_pass_fbo(p, &p->blend_subs_fbo, rect.w, rect.h, 0);
|
2015-10-23 17:52:03 +00:00
|
|
|
pass_draw_osd(p, OSD_DRAW_SUB_ONLY, vpts, rect,
|
2016-04-05 18:58:22 +00:00
|
|
|
rect.w, rect.h, p->blend_subs_fbo.fbo, false);
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = texture(texture0, texcoord0);)
|
2016-03-22 12:34:39 +00:00
|
|
|
pass_read_fbo(p, &p->blend_subs_fbo);
|
2015-04-10 20:19:44 +00:00
|
|
|
}
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "MAIN", &p->texture_offset);
|
2015-09-23 20:43:27 +00:00
|
|
|
|
2015-03-15 21:52:34 +00:00
|
|
|
pass_scale_main(p);
|
2015-03-23 01:42:19 +00:00
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
int vp_w = p->dst_rect.x1 - p->dst_rect.x0,
|
|
|
|
vp_h = p->dst_rect.y1 - p->dst_rect.y0;
|
2015-04-11 17:16:34 +00:00
|
|
|
if (p->osd && p->opts.blend_subs == 1) {
|
2015-03-23 01:42:19 +00:00
|
|
|
// Recreate the real video size from the src/dst rects
|
|
|
|
struct mp_osd_res rect = {
|
|
|
|
.w = vp_w, .h = vp_h,
|
2015-10-23 17:52:03 +00:00
|
|
|
.ml = -p->src_rect.x0, .mr = p->src_rect.x1 - p->image_params.w,
|
|
|
|
.mt = -p->src_rect.y0, .mb = p->src_rect.y1 - p->image_params.h,
|
2015-03-23 01:42:19 +00:00
|
|
|
.display_par = 1.0,
|
|
|
|
};
|
|
|
|
// Adjust margins for scale
|
|
|
|
double scale[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, true, scale);
|
2015-03-23 01:42:19 +00:00
|
|
|
rect.ml *= scale[0]; rect.mr *= scale[0];
|
|
|
|
rect.mt *= scale[1]; rect.mb *= scale[1];
|
2015-03-29 04:34:34 +00:00
|
|
|
// We should always blend subtitles in non-linear light
|
|
|
|
if (p->use_linear)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_delinearize(p->sc, p->image_params.gamma);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &p->blend_subs_fbo, p->texture_w, p->texture_h,
|
2015-10-23 17:52:03 +00:00
|
|
|
FBOTEX_FUZZY);
|
|
|
|
pass_draw_osd(p, OSD_DRAW_SUB_ONLY, vpts, rect,
|
|
|
|
p->texture_w, p->texture_h, p->blend_subs_fbo.fbo, false);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
pass_read_fbo(p, &p->blend_subs_fbo);
|
2015-03-29 04:34:34 +00:00
|
|
|
if (p->use_linear)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_linearize(p->sc, p->image_params.gamma);
|
2015-03-23 01:42:19 +00:00
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "SCALED", NULL);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void pass_draw_to_screen(struct gl_video *p, int fbo)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
2016-05-12 09:27:00 +00:00
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
if (p->dumb_mode)
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
pass_render_frame_dumb(p, fbo);
|
|
|
|
|
2015-03-27 10:12:46 +00:00
|
|
|
// Adjust the overall gamma before drawing to screen
|
|
|
|
if (p->user_gamma != 1) {
|
|
|
|
gl_sc_uniform_f(p->sc, "user_gamma", p->user_gamma);
|
|
|
|
GLSL(color.rgb = clamp(color.rgb, 0.0, 1.0);)
|
|
|
|
GLSL(color.rgb = pow(color.rgb, vec3(user_gamma));)
|
|
|
|
}
|
2016-03-29 20:26:24 +00:00
|
|
|
|
2015-03-27 10:12:46 +00:00
|
|
|
pass_colormanage(p, p->image_params.primaries,
|
|
|
|
p->use_linear ? MP_CSP_TRC_LINEAR : p->image_params.gamma);
|
2016-03-29 20:26:24 +00:00
|
|
|
|
|
|
|
// Draw checkerboard pattern to indicate transparency
|
|
|
|
if (p->has_alpha && p->opts.alpha_mode == 3) {
|
|
|
|
GLSLF("// transparency checkerboard\n");
|
|
|
|
GLSL(bvec2 tile = lessThan(fract(gl_FragCoord.xy / 32.0), vec2(0.5));)
|
|
|
|
GLSL(vec3 background = vec3(tile.x == tile.y ? 1.0 : 0.75);)
|
|
|
|
GLSL(color.rgb = mix(background, color.rgb, color.a);)
|
|
|
|
}
|
|
|
|
|
2016-04-19 18:45:40 +00:00
|
|
|
pass_opt_hook_point(p, "OUTPUT", NULL);
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
pass_dither(p);
|
2016-03-28 14:30:48 +00:00
|
|
|
finish_pass_direct(p, fbo, p->vp_w, p->vp_h, &p->dst_rect);
|
2016-05-12 09:27:00 +00:00
|
|
|
|
|
|
|
if (gl_sc_error_state(p->sc)) {
|
|
|
|
// Make the screen solid blue to make it visually clear that an
|
|
|
|
// error has occurred
|
|
|
|
gl->ClearColor(0.0, 0.05, 0.5, 1.0);
|
|
|
|
gl->Clear(GL_COLOR_BUFFER_BIT);
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Draws an interpolate frame to fbo, based on the frame timing in t
|
2015-07-01 17:24:28 +00:00
|
|
|
static void gl_video_interpolate_frame(struct gl_video *p, struct vo_frame *t,
|
|
|
|
int fbo)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
|
|
|
int vp_w = p->dst_rect.x1 - p->dst_rect.x0,
|
2015-03-25 22:06:46 +00:00
|
|
|
vp_h = p->dst_rect.y1 - p->dst_rect.y0;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-06-30 23:25:30 +00:00
|
|
|
// Reset the queue completely if this is a still image, to avoid any
|
|
|
|
// interpolation artifacts from surrounding frames when unpausing or
|
|
|
|
// framestepping
|
|
|
|
if (t->still)
|
|
|
|
gl_video_reset_surfaces(p);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// First of all, figure out if we have a frame availble at all, and draw
|
|
|
|
// it manually + reset the queue if not
|
2015-06-26 08:59:57 +00:00
|
|
|
if (p->surfaces[p->surface_now].pts == MP_NOPTS_VALUE) {
|
2015-07-17 21:21:04 +00:00
|
|
|
gl_video_upload_image(p, t->current);
|
|
|
|
pass_render_frame(p);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
finish_pass_fbo(p, &p->surfaces[p->surface_now].fbotex,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
vp_w, vp_h, FBOTEX_FUZZY);
|
2015-06-26 08:59:57 +00:00
|
|
|
p->surfaces[p->surface_now].pts = p->image.mpi->pts;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->surface_idx = p->surface_now;
|
|
|
|
}
|
|
|
|
|
2015-06-26 08:59:57 +00:00
|
|
|
// Find the right frame for this instant
|
2015-07-01 17:24:28 +00:00
|
|
|
if (t->current&& t->current->pts != MP_NOPTS_VALUE) {
|
2015-06-26 08:59:57 +00:00
|
|
|
int next = fbosurface_wrap(p->surface_now + 1);
|
|
|
|
while (p->surfaces[next].pts != MP_NOPTS_VALUE &&
|
|
|
|
p->surfaces[next].pts > p->surfaces[p->surface_now].pts &&
|
2015-07-01 17:24:28 +00:00
|
|
|
p->surfaces[p->surface_now].pts < t->current->pts)
|
2015-06-26 08:59:57 +00:00
|
|
|
{
|
|
|
|
p->surface_now = next;
|
|
|
|
next = fbosurface_wrap(next + 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Figure out the queue size. For illustration, a filter radius of 2 would
|
|
|
|
// look like this: _ A [B] C D _
|
2015-11-28 14:45:35 +00:00
|
|
|
// A is surface_bse, B is surface_now, C is surface_now+1 and D is
|
2015-03-13 18:30:31 +00:00
|
|
|
// surface_end.
|
2016-03-05 08:42:57 +00:00
|
|
|
struct scaler *tscale = &p->scaler[SCALER_TSCALE];
|
|
|
|
reinit_scaler(p, tscale, &p->opts.scaler[SCALER_TSCALE], 1, tscale_sizes);
|
2015-07-11 11:55:45 +00:00
|
|
|
bool oversample = strcmp(tscale->conf.kernel.name, "oversample") == 0;
|
2015-03-15 06:11:51 +00:00
|
|
|
int size;
|
2015-03-13 18:30:31 +00:00
|
|
|
|
2015-07-11 11:55:45 +00:00
|
|
|
if (oversample) {
|
|
|
|
size = 2;
|
|
|
|
} else {
|
|
|
|
assert(tscale->kernel && !tscale->kernel->polar);
|
|
|
|
size = ceil(tscale->kernel->size);
|
|
|
|
assert(size <= TEXUNIT_VIDEO_NUM);
|
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
|
|
|
int radius = size/2;
|
2015-03-13 18:30:31 +00:00
|
|
|
int surface_now = p->surface_now;
|
|
|
|
int surface_bse = fbosurface_wrap(surface_now - (radius-1));
|
|
|
|
int surface_end = fbosurface_wrap(surface_now + radius);
|
|
|
|
assert(fbosurface_wrap(surface_bse + size-1) == surface_end);
|
|
|
|
|
2015-06-26 08:59:57 +00:00
|
|
|
// Render new frames while there's room in the queue. Note that technically,
|
|
|
|
// this should be done before the step where we find the right frame, but
|
|
|
|
// it only barely matters at the very beginning of playback, and this way
|
|
|
|
// makes the code much more linear.
|
2015-03-13 18:30:31 +00:00
|
|
|
int surface_dst = fbosurface_wrap(p->surface_idx+1);
|
2015-07-01 17:24:28 +00:00
|
|
|
for (int i = 0; i < t->num_frames; i++) {
|
2015-06-26 08:59:57 +00:00
|
|
|
// Avoid overwriting data we might still need
|
|
|
|
if (surface_dst == surface_bse - 1)
|
|
|
|
break;
|
|
|
|
|
2015-07-01 17:24:28 +00:00
|
|
|
struct mp_image *f = t->frames[i];
|
2015-07-15 12:58:56 +00:00
|
|
|
if (!mp_image_params_equal(&f->params, &p->real_image_params) ||
|
|
|
|
f->pts == MP_NOPTS_VALUE)
|
2015-06-26 08:59:57 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (f->pts > p->surfaces[p->surface_idx].pts) {
|
2015-07-17 21:21:04 +00:00
|
|
|
gl_video_upload_image(p, f);
|
|
|
|
pass_render_frame(p);
|
2015-06-26 08:59:57 +00:00
|
|
|
finish_pass_fbo(p, &p->surfaces[surface_dst].fbotex,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
vp_w, vp_h, FBOTEX_FUZZY);
|
2015-06-26 08:59:57 +00:00
|
|
|
p->surfaces[surface_dst].pts = f->pts;
|
|
|
|
p->surface_idx = surface_dst;
|
|
|
|
surface_dst = fbosurface_wrap(surface_dst+1);
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Figure out whether the queue is "valid". A queue is invalid if the
|
|
|
|
// frames' PTS is not monotonically increasing. Anything else is invalid,
|
|
|
|
// so avoid blending incorrect data and just draw the latest frame as-is.
|
|
|
|
// Possible causes for failure of this condition include seeks, pausing,
|
|
|
|
// end of playback or start of playback.
|
|
|
|
bool valid = true;
|
2015-03-15 22:25:01 +00:00
|
|
|
for (int i = surface_bse, ii; valid && i != surface_end; i = ii) {
|
|
|
|
ii = fbosurface_wrap(i+1);
|
2015-06-26 08:59:57 +00:00
|
|
|
if (p->surfaces[i].pts == MP_NOPTS_VALUE ||
|
|
|
|
p->surfaces[ii].pts == MP_NOPTS_VALUE)
|
|
|
|
{
|
2015-03-13 18:30:31 +00:00
|
|
|
valid = false;
|
2015-03-15 22:25:01 +00:00
|
|
|
} else if (p->surfaces[ii].pts < p->surfaces[i].pts) {
|
|
|
|
valid = false;
|
|
|
|
MP_DBG(p, "interpolation queue underrun\n");
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-25 21:40:10 +00:00
|
|
|
// Update OSD PTS to synchronize subtitles with the displayed frame
|
2015-06-26 08:59:57 +00:00
|
|
|
p->osd_pts = p->surfaces[surface_now].pts;
|
2015-03-25 21:40:10 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Finally, draw the right mix of frames to the screen.
|
2015-06-30 23:25:30 +00:00
|
|
|
if (!valid || t->still) {
|
2015-03-13 18:30:31 +00:00
|
|
|
// surface_now is guaranteed to be valid, so we can safely use it.
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
pass_read_fbo(p, &p->surfaces[surface_now].fbotex);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->is_interpolated = false;
|
|
|
|
} else {
|
2015-11-28 14:45:35 +00:00
|
|
|
double mix = t->vsync_offset / t->ideal_frame_duration;
|
2015-06-26 08:59:57 +00:00
|
|
|
// The scaler code always wants the fcoord to be between 0 and 1,
|
|
|
|
// so we try to adjust by using the previous set of N frames instead
|
|
|
|
// (which requires some extra checking to make sure it's valid)
|
|
|
|
if (mix < 0.0) {
|
|
|
|
int prev = fbosurface_wrap(surface_bse - 1);
|
|
|
|
if (p->surfaces[prev].pts != MP_NOPTS_VALUE &&
|
|
|
|
p->surfaces[prev].pts < p->surfaces[surface_bse].pts)
|
|
|
|
{
|
|
|
|
mix += 1.0;
|
|
|
|
surface_bse = prev;
|
|
|
|
} else {
|
|
|
|
mix = 0.0; // at least don't blow up, this should only
|
|
|
|
// ever happen at the start of playback
|
|
|
|
}
|
2015-03-15 06:11:51 +00:00
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
2015-07-11 11:55:45 +00:00
|
|
|
// Blend the frames together
|
|
|
|
if (oversample) {
|
2015-11-28 14:45:35 +00:00
|
|
|
double vsync_dist = t->vsync_interval / t->ideal_frame_duration,
|
2015-07-11 11:55:45 +00:00
|
|
|
threshold = tscale->conf.kernel.params[0];
|
|
|
|
threshold = isnan(threshold) ? 0.0 : threshold;
|
|
|
|
mix = (1 - mix) / vsync_dist;
|
|
|
|
mix = mix <= 0 + threshold ? 0 : mix;
|
|
|
|
mix = mix >= 1 - threshold ? 1 : mix;
|
|
|
|
mix = 1 - mix;
|
|
|
|
gl_sc_uniform_f(p->sc, "inter_coeff", mix);
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = mix(texture(texture0, texcoord0),
|
|
|
|
texture(texture1, texcoord1),
|
|
|
|
inter_coeff);)
|
2015-07-11 11:55:45 +00:00
|
|
|
} else {
|
|
|
|
gl_sc_uniform_f(p->sc, "fcoord", mix);
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_separated_gen(p->sc, tscale, 0, 0);
|
2015-07-11 11:55:45 +00:00
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
|
|
|
// Load all the required frames
|
2015-03-13 18:30:31 +00:00
|
|
|
for (int i = 0; i < size; i++) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex img =
|
|
|
|
img_tex_fbo(&p->surfaces[fbosurface_wrap(surface_bse+i)].fbotex,
|
2016-04-16 16:14:32 +00:00
|
|
|
PLANE_RGB, p->components);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Since the code in pass_sample_separated currently assumes
|
|
|
|
// the textures are bound in-order and starting at 0, we just
|
|
|
|
// assert to make sure this is the case (which it should always be)
|
|
|
|
int id = pass_bind(p, img);
|
|
|
|
assert(id == i);
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
2015-11-28 14:45:35 +00:00
|
|
|
MP_DBG(p, "inter frame dur: %f vsync: %f, mix: %f\n",
|
|
|
|
t->ideal_frame_duration, t->vsync_interval, mix);
|
2015-03-13 18:30:31 +00:00
|
|
|
p->is_interpolated = true;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
pass_draw_to_screen(p, fbo);
|
2015-07-02 11:17:20 +00:00
|
|
|
|
|
|
|
p->frames_drawn += 1;
|
2014-11-23 19:06:05 +00:00
|
|
|
}
|
|
|
|
|
2015-03-23 15:28:33 +00:00
|
|
|
// (fbo==0 makes BindFramebuffer select the screen backbuffer)
|
2015-07-01 17:24:28 +00:00
|
|
|
void gl_video_render_frame(struct gl_video *p, struct vo_frame *frame, int fbo)
|
2015-03-23 15:28:33 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
|
2015-07-01 17:24:28 +00:00
|
|
|
bool has_frame = frame->current || vimg->mpi;
|
|
|
|
|
|
|
|
if (!has_frame || p->dst_rect.x0 > 0 || p->dst_rect.y0 > 0 ||
|
2015-03-23 15:28:33 +00:00
|
|
|
p->dst_rect.x1 < p->vp_w || p->dst_rect.y1 < abs(p->vp_h))
|
|
|
|
{
|
|
|
|
struct m_color c = p->opts.background;
|
|
|
|
gl->ClearColor(c.r / 255.0, c.g / 255.0, c.b / 255.0, c.a / 255.0);
|
|
|
|
gl->Clear(GL_COLOR_BUFFER_BIT);
|
|
|
|
}
|
|
|
|
|
2015-07-01 17:24:28 +00:00
|
|
|
if (has_frame) {
|
|
|
|
gl_sc_set_vao(p->sc, &p->vao);
|
2015-03-23 15:28:33 +00:00
|
|
|
|
2016-01-27 20:07:17 +00:00
|
|
|
bool interpolate = p->opts.interpolation && frame->display_synced &&
|
|
|
|
(p->frames_drawn || !frame->still);
|
|
|
|
if (interpolate) {
|
|
|
|
double ratio = frame->ideal_frame_duration / frame->vsync_interval;
|
|
|
|
if (fabs(ratio - 1.0) < p->opts.interpolation_threshold)
|
|
|
|
interpolate = false;
|
|
|
|
}
|
2016-01-25 20:46:01 +00:00
|
|
|
|
2016-01-27 20:07:17 +00:00
|
|
|
if (interpolate) {
|
2015-07-01 17:24:28 +00:00
|
|
|
gl_video_interpolate_frame(p, frame, fbo);
|
|
|
|
} else {
|
2015-09-05 10:02:02 +00:00
|
|
|
bool is_new = !frame->redraw && !frame->repeat;
|
|
|
|
if (is_new || !p->output_fbo_valid) {
|
2015-11-15 17:30:54 +00:00
|
|
|
p->output_fbo_valid = false;
|
|
|
|
|
2015-07-17 21:21:04 +00:00
|
|
|
gl_video_upload_image(p, frame->current);
|
2015-09-05 10:02:02 +00:00
|
|
|
pass_render_frame(p);
|
|
|
|
|
2015-11-15 17:30:54 +00:00
|
|
|
// For the non-interplation case, we draw to a single "cache"
|
|
|
|
// FBO to speed up subsequent re-draws (if any exist)
|
|
|
|
int dest_fbo = fbo;
|
|
|
|
if (frame->num_vsyncs > 1 && frame->display_synced &&
|
2015-11-19 20:22:24 +00:00
|
|
|
!p->dumb_mode && gl->BlitFramebuffer)
|
2015-10-30 11:53:43 +00:00
|
|
|
{
|
2015-11-15 17:30:54 +00:00
|
|
|
fbotex_change(&p->output_fbo, p->gl, p->log,
|
|
|
|
p->vp_w, abs(p->vp_h),
|
|
|
|
p->opts.fbo_format, 0);
|
|
|
|
dest_fbo = p->output_fbo.fbo;
|
2015-09-05 10:02:02 +00:00
|
|
|
p->output_fbo_valid = true;
|
|
|
|
}
|
2015-11-15 17:30:54 +00:00
|
|
|
pass_draw_to_screen(p, dest_fbo);
|
2015-09-05 10:02:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// "output fbo valid" and "output fbo needed" are equivalent
|
|
|
|
if (p->output_fbo_valid) {
|
2015-11-15 17:30:54 +00:00
|
|
|
gl->BindFramebuffer(GL_READ_FRAMEBUFFER, p->output_fbo.fbo);
|
|
|
|
gl->BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
|
|
|
|
struct mp_rect rc = p->dst_rect;
|
|
|
|
if (p->vp_h < 0) {
|
|
|
|
rc.y1 = -p->vp_h - p->dst_rect.y0;
|
|
|
|
rc.y0 = -p->vp_h - p->dst_rect.y1;
|
|
|
|
}
|
|
|
|
gl->BlitFramebuffer(rc.x0, rc.y0, rc.x1, rc.y1,
|
|
|
|
rc.x0, rc.y0, rc.x1, rc.y1,
|
|
|
|
GL_COLOR_BUFFER_BIT, GL_NEAREST);
|
|
|
|
gl->BindFramebuffer(GL_READ_FRAMEBUFFER, 0);
|
|
|
|
gl->BindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
|
2015-09-05 10:02:02 +00:00
|
|
|
}
|
2015-07-01 17:24:28 +00:00
|
|
|
}
|
2015-03-23 15:28:33 +00:00
|
|
|
}
|
|
|
|
|
2015-06-26 08:59:57 +00:00
|
|
|
debug_check_gl(p, "after video rendering");
|
|
|
|
|
2015-03-23 15:28:33 +00:00
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
|
2015-03-23 01:42:19 +00:00
|
|
|
if (p->osd) {
|
|
|
|
pass_draw_osd(p, p->opts.blend_subs ? OSD_DRAW_OSD_ONLY : 0,
|
2015-03-29 04:34:34 +00:00
|
|
|
p->osd_pts, p->osd_rect, p->vp_w, p->vp_h, fbo, true);
|
2015-03-23 01:42:19 +00:00
|
|
|
debug_check_gl(p, "after OSD rendering");
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
gl->UseProgram(0);
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, 0);
|
|
|
|
|
2015-11-10 13:36:23 +00:00
|
|
|
// The playloop calls this last before waiting some time until it decides
|
|
|
|
// to call flip_page(). Tell OpenGL to start execution of the GPU commands
|
|
|
|
// while we sleep (this happens asynchronously).
|
|
|
|
gl->Flush();
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
p->frames_rendered++;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// vp_w/vp_h is the implicit size of the target framebuffer.
|
|
|
|
// vp_h can be negative to flip the screen.
|
|
|
|
void gl_video_resize(struct gl_video *p, int vp_w, int vp_h,
|
2013-03-01 20:19:20 +00:00
|
|
|
struct mp_rect *src, struct mp_rect *dst,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct mp_osd_res *osd)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
p->src_rect = *src;
|
|
|
|
p->dst_rect = *dst;
|
|
|
|
p->osd_rect = *osd;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
p->vp_w = vp_w;
|
|
|
|
p->vp_h = vp_h;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
|
|
|
gl_video_reset_surfaces(p);
|
2016-03-21 21:23:41 +00:00
|
|
|
|
2016-03-23 13:49:39 +00:00
|
|
|
if (p->osd)
|
|
|
|
mpgl_osd_resize(p->osd, p->osd_rect, p->image_params.stereo_out);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-09-02 20:45:07 +00:00
|
|
|
static bool unmap_image(struct gl_video *p, struct mp_image *mpi)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
bool ok = true;
|
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
|
|
|
struct texplane *plane = &vimg->planes[n];
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
ok = gl->UnmapBuffer(GL_PIXEL_UNPACK_BUFFER) && ok;
|
|
|
|
mpi->planes[n] = NULL; // PBO offset 0
|
|
|
|
}
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
|
|
|
return ok;
|
|
|
|
}
|
|
|
|
|
2015-09-02 10:39:19 +00:00
|
|
|
static bool map_image(struct gl_video *p, struct mp_image *mpi)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
if (!p->opts.pbo)
|
|
|
|
return false;
|
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
2013-03-28 19:40:19 +00:00
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2015-04-10 18:58:26 +00:00
|
|
|
mpi->stride[n] = mp_image_plane_w(mpi, n) * p->image_desc.bytes[n];
|
2015-09-02 20:45:07 +00:00
|
|
|
if (!plane->gl_buffer) {
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->GenBuffers(1, &plane->gl_buffer);
|
2015-09-02 20:45:07 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
size_t buffer_size = mp_image_plane_h(mpi, n) * mpi->stride[n];
|
|
|
|
gl->BufferData(GL_PIXEL_UNPACK_BUFFER, buffer_size,
|
2013-03-01 20:19:20 +00:00
|
|
|
NULL, GL_DYNAMIC_DRAW);
|
|
|
|
}
|
2015-09-02 20:45:07 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
mpi->planes[n] = gl->MapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
2015-09-02 20:45:07 +00:00
|
|
|
if (!mpi->planes[n]) {
|
|
|
|
unmap_image(p, mpi);
|
|
|
|
return false;
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2015-09-02 10:44:46 +00:00
|
|
|
memset(mpi->bufs, 0, sizeof(mpi->bufs));
|
2013-03-01 20:19:20 +00:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-07-15 10:22:49 +00:00
|
|
|
static void gl_video_upload_image(struct gl_video *p, struct mp_image *mpi)
|
2015-03-22 00:32:03 +00:00
|
|
|
{
|
2015-07-15 10:22:49 +00:00
|
|
|
GL *gl = p->gl;
|
2015-03-22 00:32:03 +00:00
|
|
|
struct video_image *vimg = &p->image;
|
2015-05-01 16:44:45 +00:00
|
|
|
|
2015-07-15 10:22:49 +00:00
|
|
|
mpi = mp_image_new_ref(mpi);
|
|
|
|
if (!mpi)
|
2015-06-26 08:59:57 +00:00
|
|
|
abort();
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
|
|
|
|
2015-07-15 10:22:49 +00:00
|
|
|
vimg->mpi = mpi;
|
2015-05-01 16:44:45 +00:00
|
|
|
p->osd_pts = mpi->pts;
|
2015-07-15 10:22:49 +00:00
|
|
|
p->frames_uploaded++;
|
2013-03-28 19:40:19 +00:00
|
|
|
|
2015-07-26 18:13:53 +00:00
|
|
|
if (p->hwdec_active) {
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
struct gl_hwdec_frame gl_frame = {0};
|
|
|
|
bool ok = p->hwdec->driver->map_frame(p->hwdec, vimg->mpi, &gl_frame) >= 0;
|
|
|
|
vimg->hwdec_mapped = true;
|
|
|
|
if (ok) {
|
|
|
|
struct mp_image layout = {0};
|
|
|
|
mp_image_set_params(&layout, &p->image_params);
|
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
|
|
|
struct gl_hwdec_plane *plane = &gl_frame.planes[n];
|
|
|
|
vimg->planes[n] = (struct texplane){
|
|
|
|
.w = mp_image_plane_w(&layout, n),
|
|
|
|
.h = mp_image_plane_h(&layout, n),
|
|
|
|
.tex_w = plane->tex_w,
|
|
|
|
.tex_h = plane->tex_h,
|
|
|
|
.gl_target = plane->gl_target,
|
|
|
|
.gl_texture = plane->gl_texture,
|
|
|
|
};
|
2016-05-10 19:11:58 +00:00
|
|
|
snprintf(vimg->planes[n].swizzle, sizeof(vimg->planes[n].swizzle),
|
|
|
|
"%s", plane->swizzle);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
}
|
|
|
|
} else {
|
2016-04-27 11:32:20 +00:00
|
|
|
MP_FATAL(p, "Mapping hardware decoded surface failed.\n");
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
|
|
|
}
|
2013-11-03 23:00:18 +00:00
|
|
|
return;
|
2015-07-26 18:13:53 +00:00
|
|
|
}
|
2013-11-03 23:00:18 +00:00
|
|
|
|
|
|
|
assert(mpi->num_planes == p->plane_count);
|
|
|
|
|
2015-09-02 20:45:07 +00:00
|
|
|
mp_image_t pbo_mpi = *mpi;
|
|
|
|
bool pbo = map_image(p, &pbo_mpi);
|
|
|
|
if (pbo) {
|
|
|
|
mp_image_copy(&pbo_mpi, mpi);
|
|
|
|
if (unmap_image(p, &pbo_mpi)) {
|
|
|
|
mpi = &pbo_mpi;
|
|
|
|
} else {
|
|
|
|
MP_FATAL(p, "Video PBO upload failed. Disabling PBOs.\n");
|
|
|
|
pbo = false;
|
|
|
|
p->opts.pbo = 0;
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2015-09-02 20:45:07 +00:00
|
|
|
|
|
|
|
vimg->image_flipped = mpi->stride[0] < 0;
|
2013-11-03 23:00:18 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
2013-03-28 19:40:19 +00:00
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2015-09-02 20:45:07 +00:00
|
|
|
if (pbo)
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + n);
|
2016-02-18 09:46:03 +00:00
|
|
|
gl->BindTexture(plane->gl_target, plane->gl_texture);
|
|
|
|
glUploadTex(gl, plane->gl_target, plane->gl_format, plane->gl_type,
|
2015-09-02 20:45:07 +00:00
|
|
|
mpi->planes[n], mpi->stride[n], 0, 0, plane->w, plane->h, 0);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2014-12-20 16:22:36 +00:00
|
|
|
if (pbo)
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2016-05-12 19:08:51 +00:00
|
|
|
static bool test_fbo(struct gl_video *p, GLint format)
|
2013-05-30 13:37:13 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
2015-09-05 09:39:20 +00:00
|
|
|
bool success = false;
|
2016-05-12 19:08:51 +00:00
|
|
|
MP_VERBOSE(p, "Testing FBO format 0x%x\n", (unsigned)format);
|
2013-05-30 13:37:13 +00:00
|
|
|
struct fbotex fbo = {0};
|
2016-05-12 19:08:51 +00:00
|
|
|
if (fbotex_init(&fbo, p->gl, p->log, 16, 16, format)) {
|
2013-05-30 13:37:13 +00:00
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo.fbo);
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, 0);
|
2015-09-05 09:39:20 +00:00
|
|
|
success = true;
|
2013-05-30 13:37:13 +00:00
|
|
|
}
|
2015-01-29 13:58:26 +00:00
|
|
|
fbotex_uninit(&fbo);
|
2013-09-11 22:57:32 +00:00
|
|
|
glCheckError(gl, p->log, "FBO test");
|
2015-09-05 09:39:20 +00:00
|
|
|
return success;
|
2013-05-30 13:37:13 +00:00
|
|
|
}
|
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
// Return whether dumb-mode can be used without disabling any features.
|
|
|
|
// Essentially, vo_opengl with mostly default settings will return true.
|
|
|
|
static bool check_dumb_mode(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct gl_video_opts *o = &p->opts;
|
2016-01-26 19:47:32 +00:00
|
|
|
if (p->use_integer_conversion)
|
|
|
|
return false;
|
2015-11-19 20:22:24 +00:00
|
|
|
if (o->dumb_mode)
|
|
|
|
return true;
|
|
|
|
if (o->target_prim || o->target_trc || o->linear_scaling ||
|
|
|
|
o->correct_downscaling || o->sigmoid_upscaling || o->interpolation ||
|
2016-03-05 11:02:01 +00:00
|
|
|
o->blend_subs || o->deband || o->unsharp || o->prescale_luma)
|
2015-11-19 20:22:24 +00:00
|
|
|
return false;
|
2016-03-05 08:42:57 +00:00
|
|
|
// check remaining scalers (tscale is already implicitly excluded above)
|
|
|
|
for (int i = 0; i < SCALER_COUNT; i++) {
|
|
|
|
if (i != SCALER_TSCALE) {
|
|
|
|
const char *name = o->scaler[i].kernel.name;
|
|
|
|
if (name && strcmp(name, "bilinear") != 0)
|
|
|
|
return false;
|
|
|
|
}
|
2015-11-19 20:22:24 +00:00
|
|
|
}
|
|
|
|
if (o->pre_shaders && o->pre_shaders[0])
|
|
|
|
return false;
|
|
|
|
if (o->post_shaders && o->post_shaders[0])
|
|
|
|
return false;
|
2016-04-20 23:33:13 +00:00
|
|
|
if (o->user_shaders && o->user_shaders[0])
|
|
|
|
return false;
|
2015-11-19 20:22:24 +00:00
|
|
|
if (p->use_lut_3d)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
// Disable features that are not supported with the current OpenGL version.
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
static void check_gl_features(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
2016-05-12 18:08:49 +00:00
|
|
|
bool have_float_tex = !!gl_find_float16_format(gl, 1);
|
2014-12-23 01:48:58 +00:00
|
|
|
bool have_3d_tex = gl->mpgl_caps & MPGL_CAP_3D_TEX;
|
2016-05-12 18:52:26 +00:00
|
|
|
bool have_mglsl = gl->glsl_version >= 130; // modern GLSL (1st class arrays etc.)
|
2015-11-16 19:09:15 +00:00
|
|
|
bool have_texrg = gl->mpgl_caps & MPGL_CAP_TEX_RG;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-05-12 19:08:51 +00:00
|
|
|
const GLint auto_fbo_fmts[] = {GL_RGBA16, GL_RGBA16F, GL_RGB10_A2,
|
|
|
|
GL_RGBA8, 0};
|
|
|
|
GLint user_fbo_fmts[] = {p->opts.fbo_format, 0};
|
|
|
|
const GLint *fbo_fmts = user_fbo_fmts[0] ? user_fbo_fmts : auto_fbo_fmts;
|
|
|
|
bool have_fbo = false;
|
|
|
|
for (int n = 0; fbo_fmts[n]; n++) {
|
|
|
|
GLint fmt = fbo_fmts[n];
|
|
|
|
const struct gl_format *f = gl_find_internal_format(gl, fmt);
|
|
|
|
if (f && (f->flags & F_CF) == F_CF && test_fbo(p, fmt)) {
|
|
|
|
MP_VERBOSE(p, "Using FBO format 0x%x.\n", (unsigned)fmt);
|
|
|
|
have_fbo = true;
|
|
|
|
p->opts.fbo_format = fmt;
|
|
|
|
break;
|
|
|
|
}
|
2015-11-19 20:20:50 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
if (gl->es && p->opts.pbo) {
|
|
|
|
p->opts.pbo = 0;
|
|
|
|
MP_WARN(p, "Disabling PBOs (GLES unsupported).\n");
|
|
|
|
}
|
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
p->forced_dumb_mode = p->opts.dumb_mode || !have_fbo || !have_texrg;
|
2015-11-19 20:22:24 +00:00
|
|
|
bool voluntarily_dumb = check_dumb_mode(p);
|
2016-01-26 19:47:32 +00:00
|
|
|
if (p->forced_dumb_mode || voluntarily_dumb) {
|
2015-11-19 20:22:24 +00:00
|
|
|
if (voluntarily_dumb) {
|
|
|
|
MP_VERBOSE(p, "No advanced processing required. Enabling dumb mode.\n");
|
|
|
|
} else if (!p->opts.dumb_mode) {
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
MP_WARN(p, "High bit depth FBOs unsupported. Enabling dumb mode.\n"
|
|
|
|
"Most extended features will be disabled.\n");
|
|
|
|
}
|
2015-11-19 20:22:24 +00:00
|
|
|
p->dumb_mode = true;
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
p->use_lut_3d = false;
|
2015-09-08 20:55:01 +00:00
|
|
|
// Most things don't work, so whitelist all options that still work.
|
|
|
|
struct gl_video_opts new_opts = {
|
|
|
|
.gamma = p->opts.gamma,
|
|
|
|
.gamma_auto = p->opts.gamma_auto,
|
|
|
|
.pbo = p->opts.pbo,
|
|
|
|
.fbo_format = p->opts.fbo_format,
|
|
|
|
.alpha_mode = p->opts.alpha_mode,
|
|
|
|
.use_rectangle = p->opts.use_rectangle,
|
|
|
|
.background = p->opts.background,
|
|
|
|
.dither_algo = -1,
|
|
|
|
};
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++)
|
|
|
|
new_opts.scaler[n] = gl_video_opts_def.scaler[n];
|
2015-09-08 20:55:01 +00:00
|
|
|
assign_options(&p->opts, &new_opts);
|
2015-09-09 18:40:04 +00:00
|
|
|
p->opts.deband_opts = m_config_alloc_struct(NULL, &deband_conf);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
return;
|
2015-09-05 09:39:20 +00:00
|
|
|
}
|
2015-11-19 20:22:24 +00:00
|
|
|
p->dumb_mode = false;
|
2015-09-05 09:39:20 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
// Normally, we want to disable them by default if FBOs are unavailable,
|
|
|
|
// because they will be slow (not critically slow, but still slower).
|
|
|
|
// Without FP textures, we must always disable them.
|
2014-12-16 17:55:02 +00:00
|
|
|
// I don't know if luminance alpha float textures exist, so disregard them.
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const struct filter_kernel *kernel =
|
|
|
|
mp_find_filter_kernel(p->opts.scaler[n].kernel.name);
|
2015-02-26 09:35:49 +00:00
|
|
|
if (kernel) {
|
|
|
|
char *reason = NULL;
|
|
|
|
if (!have_float_tex)
|
2015-07-27 21:18:19 +00:00
|
|
|
reason = "(float tex. missing)";
|
2016-05-12 18:52:26 +00:00
|
|
|
if (!have_mglsl)
|
|
|
|
reason = "(GLSL version too old)";
|
2015-02-26 09:35:49 +00:00
|
|
|
if (reason) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
p->opts.scaler[n].kernel.name = "bilinear";
|
2015-07-27 21:18:19 +00:00
|
|
|
MP_WARN(p, "Disabling scaler #%d %s.\n", n, reason);
|
2016-05-12 17:34:02 +00:00
|
|
|
if (n == SCALER_TSCALE)
|
|
|
|
p->opts.interpolation = 0;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-19 00:03:08 +00:00
|
|
|
// GLES3 doesn't provide filtered 16 bit integer textures
|
|
|
|
// GLES2 doesn't even provide 3D textures
|
2016-02-13 14:33:00 +00:00
|
|
|
if (p->use_lut_3d && (!have_3d_tex || gl->es)) {
|
2014-12-17 20:48:23 +00:00
|
|
|
p->use_lut_3d = false;
|
2015-04-11 17:24:54 +00:00
|
|
|
MP_WARN(p, "Disabling color management (GLES unsupported).\n");
|
2014-12-17 20:48:23 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
int use_cms = p->opts.target_prim != MP_CSP_PRIM_AUTO ||
|
|
|
|
p->opts.target_trc != MP_CSP_TRC_AUTO || p->use_lut_3d;
|
2014-03-05 14:01:32 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// mix() is needed for some gamma functions
|
2016-05-12 18:52:26 +00:00
|
|
|
if (!have_mglsl && (p->opts.linear_scaling || p->opts.sigmoid_upscaling)) {
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->opts.linear_scaling = false;
|
|
|
|
p->opts.sigmoid_upscaling = false;
|
2015-04-11 17:24:54 +00:00
|
|
|
MP_WARN(p, "Disabling linear/sigmoid scaling (GLSL version too old).\n");
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
2016-05-12 18:52:26 +00:00
|
|
|
if (!have_mglsl && use_cms) {
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->opts.target_prim = MP_CSP_PRIM_AUTO;
|
|
|
|
p->opts.target_trc = MP_CSP_TRC_AUTO;
|
|
|
|
p->use_lut_3d = false;
|
2015-04-11 17:24:54 +00:00
|
|
|
MP_WARN(p, "Disabling color management (GLSL version too old).\n");
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2016-05-12 18:52:26 +00:00
|
|
|
if (!have_mglsl && p->opts.deband) {
|
2015-10-01 18:44:39 +00:00
|
|
|
p->opts.deband = 0;
|
|
|
|
MP_WARN(p, "Disabling debanding (GLSL version too old).\n");
|
|
|
|
}
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
|
2016-03-05 11:02:01 +00:00
|
|
|
if (p->opts.prescale_luma == 2) {
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
if (p->opts.nnedi3_opts->upload == NNEDI3_UPLOAD_UBO) {
|
|
|
|
// Check features for uniform buffer objects.
|
2015-12-02 00:28:26 +00:00
|
|
|
if (!gl->BindBufferBase || !gl->GetUniformBlockIndex) {
|
|
|
|
MP_WARN(p, "Disabling NNEDI3 (%s required).\n",
|
|
|
|
gl->es ? "OpenGL ES 3.0" : "OpenGL 3.1");
|
2016-03-05 11:02:01 +00:00
|
|
|
p->opts.prescale_luma = 0;
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
}
|
|
|
|
} else if (p->opts.nnedi3_opts->upload == NNEDI3_UPLOAD_SHADER) {
|
|
|
|
// Check features for hard coding approach.
|
2015-12-02 00:28:26 +00:00
|
|
|
if ((!gl->es && gl->glsl_version < 330) ||
|
|
|
|
(gl->es && gl->glsl_version < 300))
|
|
|
|
{
|
|
|
|
MP_WARN(p, "Disabling NNEDI3 (%s required).\n",
|
|
|
|
gl->es ? "OpenGL ES 3.0" : "OpenGL 3.3");
|
2016-03-05 11:02:01 +00:00
|
|
|
p->opts.prescale_luma = 0;
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
static void init_gl(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
debug_check_gl(p, "before init_gl");
|
|
|
|
|
2015-12-19 10:56:19 +00:00
|
|
|
MP_VERBOSE(p, "Reported display depth: R=%d, G=%d, B=%d\n",
|
|
|
|
gl->fb_r, gl->fb_g, gl->fb_b);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->Disable(GL_DITHER);
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
gl_vao_init(&p->vao, gl, sizeof(struct vertex), vertex_vao);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2014-12-09 16:47:02 +00:00
|
|
|
gl_video_set_gl_state(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2014-12-24 15:54:47 +00:00
|
|
|
// Test whether we can use 10 bit. Hope that testing a single format/channel
|
|
|
|
// is good enough (instead of testing all 1-4 channels variants etc.).
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_unorm_format(gl, 2, 1);
|
|
|
|
if (gl->GetTexLevelParameteriv && fmt) {
|
2014-12-24 15:54:47 +00:00
|
|
|
GLuint tex;
|
|
|
|
gl->GenTextures(1, &tex);
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, tex);
|
|
|
|
gl->TexImage2D(GL_TEXTURE_2D, 0, fmt->internal_format, 64, 64, 0,
|
|
|
|
fmt->format, fmt->type, NULL);
|
|
|
|
GLenum pname = 0;
|
|
|
|
switch (fmt->format) {
|
|
|
|
case GL_RED: pname = GL_TEXTURE_RED_SIZE; break;
|
|
|
|
case GL_LUMINANCE: pname = GL_TEXTURE_LUMINANCE_SIZE; break;
|
|
|
|
}
|
|
|
|
GLint param = 0;
|
|
|
|
if (pname)
|
|
|
|
gl->GetTexLevelParameteriv(GL_TEXTURE_2D, 0, pname, ¶m);
|
|
|
|
if (param) {
|
|
|
|
MP_VERBOSE(p, "16 bit texture depth: %d.\n", (int)param);
|
|
|
|
p->texture_16bit_depth = param;
|
|
|
|
}
|
2015-02-27 21:13:15 +00:00
|
|
|
gl->DeleteTextures(1, &tex);
|
2014-12-24 15:54:47 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
debug_check_gl(p, "after init_gl");
|
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_uninit(struct gl_video *p)
|
|
|
|
{
|
2014-12-03 21:37:39 +00:00
|
|
|
if (!p)
|
|
|
|
return;
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
uninit_video(p);
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl_sc_destroy(p->sc);
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
gl_vao_uninit(&p->vao);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->DeleteTextures(1, &p->lut_3d_texture);
|
|
|
|
|
|
|
|
mpgl_osd_destroy(p->osd);
|
|
|
|
|
2015-01-29 14:50:21 +00:00
|
|
|
gl_set_debug_logger(gl, NULL);
|
2014-12-23 01:46:44 +00:00
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
assign_options(&p->opts, &(struct gl_video_opts){0});
|
2013-03-01 20:19:20 +00:00
|
|
|
talloc_free(p);
|
|
|
|
}
|
|
|
|
|
2014-12-09 16:47:02 +00:00
|
|
|
void gl_video_set_gl_state(struct gl_video *p)
|
2016-04-22 10:08:21 +00:00
|
|
|
{
|
|
|
|
// This resets certain important state to defaults.
|
|
|
|
gl_video_unset_gl_state(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_unset_gl_state(struct gl_video *p)
|
2014-12-09 16:47:02 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2015-01-22 17:54:05 +00:00
|
|
|
if (gl->mpgl_caps & MPGL_CAP_ROW_LENGTH)
|
2014-12-18 23:58:56 +00:00
|
|
|
gl->PixelStorei(GL_UNPACK_ROW_LENGTH, 0);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 4);
|
2014-12-09 16:47:02 +00:00
|
|
|
}
|
|
|
|
|
2014-11-23 19:06:05 +00:00
|
|
|
void gl_video_reset(struct gl_video *p)
|
|
|
|
{
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
gl_video_reset_surfaces(p);
|
2014-11-23 19:06:05 +00:00
|
|
|
}
|
|
|
|
|
2015-02-04 22:37:38 +00:00
|
|
|
bool gl_video_showing_interpolated_frame(struct gl_video *p)
|
|
|
|
{
|
|
|
|
return p->is_interpolated;
|
|
|
|
}
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
// dest = src.<w> (always using 4 components)
|
2016-05-13 20:03:53 +00:00
|
|
|
static void packed_fmt_swizzle(char w[5], const struct packed_fmt_entry *fmt)
|
2013-07-18 11:52:38 +00:00
|
|
|
{
|
|
|
|
for (int c = 0; c < 4; c++)
|
2016-05-13 20:03:53 +00:00
|
|
|
w[c] = "rgba"[MPMAX(fmt->components[c] - 1, 0)];
|
2013-07-18 11:52:38 +00:00
|
|
|
w[4] = '\0';
|
|
|
|
}
|
|
|
|
|
2016-05-13 19:46:08 +00:00
|
|
|
// Like gl_find_unorm_format(), but takes bits (not bytes), and if no fixed
|
2016-05-12 18:08:49 +00:00
|
|
|
// point format is available, return an unsigned integer format.
|
2016-05-13 19:46:08 +00:00
|
|
|
static const struct gl_format *find_plane_format(GL *gl, int bits, int n_channels)
|
2016-01-26 19:47:32 +00:00
|
|
|
{
|
2016-05-13 19:46:08 +00:00
|
|
|
int bytes = (bits + 7) / 8;
|
|
|
|
const struct gl_format *f = gl_find_unorm_format(gl, bytes, n_channels);
|
2016-05-12 18:08:49 +00:00
|
|
|
if (f)
|
|
|
|
return f;
|
2016-05-13 19:46:08 +00:00
|
|
|
return gl_find_uint_format(gl, bytes, n_channels);
|
2016-01-26 19:47:32 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
static void init_image_desc(struct gl_video *p, int fmt)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
p->image_desc = mp_imgfmt_get_desc(fmt);
|
2014-12-16 17:55:02 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
p->plane_count = p->image_desc.num_planes;
|
|
|
|
p->is_yuv = p->image_desc.flags & MP_IMGFLAG_YUV;
|
|
|
|
p->has_alpha = p->image_desc.flags & MP_IMGFLAG_ALPHA;
|
|
|
|
p->use_integer_conversion = false;
|
|
|
|
p->color_swizzle[0] = '\0';
|
|
|
|
p->is_packed_yuv = fmt == IMGFMT_UYVY || fmt == IMGFMT_YUYV;
|
|
|
|
p->hwdec_active = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
// test_only=true checks if the format is supported
|
|
|
|
// test_only=false also initializes some rendering parameters accordingly
|
2016-05-10 16:49:49 +00:00
|
|
|
static bool init_format(struct gl_video *p, int fmt, bool test_only)
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
{
|
2016-05-10 16:49:49 +00:00
|
|
|
struct GL *gl = p->gl;
|
2013-11-03 23:00:18 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct mp_imgfmt_desc desc = mp_imgfmt_get_desc(fmt);
|
|
|
|
if (!desc.id)
|
|
|
|
return false;
|
|
|
|
|
2013-03-28 19:48:53 +00:00
|
|
|
if (desc.num_planes > 4)
|
|
|
|
return false;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *plane_format[4] = {0};
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
char color_swizzle[5] = "";
|
2016-05-13 20:31:27 +00:00
|
|
|
const struct packed_fmt_entry *packed_format = {0};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
// YUV/planar formats
|
2015-10-18 16:37:24 +00:00
|
|
|
if (desc.flags & (MP_IMGFLAG_YUV_P | MP_IMGFLAG_RGB_P)) {
|
2015-01-21 18:29:18 +00:00
|
|
|
int bits = desc.component_bits;
|
2013-03-28 19:48:53 +00:00
|
|
|
if ((desc.flags & MP_IMGFLAG_NE) && bits >= 8 && bits <= 16) {
|
2016-05-13 19:46:08 +00:00
|
|
|
plane_format[0] = find_plane_format(gl, bits, 1);
|
2016-05-10 16:49:49 +00:00
|
|
|
for (int n = 1; n < desc.num_planes; n++)
|
|
|
|
plane_format[n] = plane_format[0];
|
2015-10-18 16:37:24 +00:00
|
|
|
// RGB/planar
|
|
|
|
if (desc.flags & MP_IMGFLAG_RGB_P)
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
snprintf(color_swizzle, sizeof(color_swizzle), "brga");
|
2014-03-01 14:40:46 +00:00
|
|
|
goto supported;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-28 20:02:41 +00:00
|
|
|
// YUV/half-packed
|
2016-01-07 15:28:19 +00:00
|
|
|
if (desc.flags & MP_IMGFLAG_YUV_NV) {
|
|
|
|
int bits = desc.component_bits;
|
|
|
|
if ((desc.flags & MP_IMGFLAG_NE) && bits >= 8 && bits <= 16) {
|
2016-05-13 19:46:08 +00:00
|
|
|
plane_format[0] = find_plane_format(gl, bits, 1);
|
|
|
|
plane_format[1] = find_plane_format(gl, bits, 2);
|
2016-01-07 15:28:19 +00:00
|
|
|
if (desc.flags & MP_IMGFLAG_YUV_NV_SWAP)
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
snprintf(color_swizzle, sizeof(color_swizzle), "rbga");
|
2016-01-07 15:28:19 +00:00
|
|
|
goto supported;
|
|
|
|
}
|
2013-03-28 20:02:41 +00:00
|
|
|
}
|
|
|
|
|
2013-06-14 20:58:21 +00:00
|
|
|
// XYZ (same organization as RGB packed, but requires conversion matrix)
|
2014-03-01 14:40:46 +00:00
|
|
|
if (fmt == IMGFMT_XYZ12) {
|
2016-05-12 18:08:49 +00:00
|
|
|
plane_format[0] = gl_find_unorm_format(gl, 2, 3);
|
2014-03-01 14:40:46 +00:00
|
|
|
goto supported;
|
2013-05-01 21:59:00 +00:00
|
|
|
}
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
// Packed RGB(A) formats
|
2014-03-01 14:40:46 +00:00
|
|
|
for (const struct packed_fmt_entry *e = mp_packed_formats; e->fmt; e++) {
|
|
|
|
if (e->fmt == fmt) {
|
|
|
|
int n_comp = desc.bytes[0] / e->component_size;
|
2016-05-12 18:08:49 +00:00
|
|
|
plane_format[0] = gl_find_unorm_format(gl, e->component_size, n_comp);
|
2016-05-13 20:31:27 +00:00
|
|
|
packed_format = e;
|
2014-03-01 14:40:46 +00:00
|
|
|
goto supported;
|
2013-03-28 19:48:53 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
// Special formats for which OpenGL happens to have direct support.
|
|
|
|
plane_format[0] = gl_find_special_format(gl, fmt);
|
|
|
|
if (plane_format[0]) {
|
|
|
|
// Packed YUV Apple formats color permutation
|
|
|
|
if (plane_format[0]->format == GL_RGB_422_APPLE)
|
|
|
|
snprintf(color_swizzle, sizeof(color_swizzle), "gbra");
|
|
|
|
goto supported;
|
2013-11-13 20:52:34 +00:00
|
|
|
}
|
|
|
|
|
2014-03-01 14:40:46 +00:00
|
|
|
// Unsupported format
|
|
|
|
return false;
|
|
|
|
|
|
|
|
supported:
|
2013-07-18 11:52:38 +00:00
|
|
|
|
2015-01-29 14:50:21 +00:00
|
|
|
if (desc.component_bits > 8 && desc.component_bits < 16) {
|
2016-05-10 16:49:49 +00:00
|
|
|
if (p->texture_16bit_depth < 16)
|
2014-12-24 15:54:47 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
int use_integer = -1;
|
2016-05-10 16:49:49 +00:00
|
|
|
for (int n = 0; n < desc.num_planes; n++) {
|
2016-05-12 18:08:49 +00:00
|
|
|
if (!plane_format[n])
|
2014-12-17 20:48:23 +00:00
|
|
|
return false;
|
2016-05-12 18:08:49 +00:00
|
|
|
int use_int_plane = !!gl_integer_format_to_base(plane_format[n]->format);
|
2016-01-26 19:47:32 +00:00
|
|
|
if (use_integer < 0)
|
|
|
|
use_integer = use_int_plane;
|
|
|
|
if (use_integer != use_int_plane)
|
|
|
|
return false; // mixed planes not supported
|
2014-12-17 20:48:23 +00:00
|
|
|
}
|
2016-01-26 19:47:32 +00:00
|
|
|
|
2016-05-10 16:49:49 +00:00
|
|
|
if (use_integer && p->forced_dumb_mode)
|
2016-01-26 19:47:32 +00:00
|
|
|
return false;
|
2014-12-17 20:48:23 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (!test_only) {
|
2016-05-10 16:49:49 +00:00
|
|
|
for (int n = 0; n < desc.num_planes; n++) {
|
|
|
|
struct texplane *plane = &p->image.planes[n];
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *format = plane_format[n];
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
assert(format);
|
|
|
|
plane->gl_format = format->format;
|
|
|
|
plane->gl_internal_format = format->internal_format;
|
|
|
|
plane->gl_type = format->type;
|
|
|
|
plane->use_integer = use_integer;
|
2016-05-13 20:31:27 +00:00
|
|
|
snprintf(plane->swizzle, sizeof(plane->swizzle), "rgba");
|
|
|
|
if (packed_format)
|
|
|
|
packed_fmt_swizzle(plane->swizzle, packed_format);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (plane->gl_format == GL_LUMINANCE_ALPHA)
|
2016-05-13 20:31:27 +00:00
|
|
|
MPSWAP(char, plane->swizzle[1], plane->swizzle[3]);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
}
|
2013-07-18 11:52:38 +00:00
|
|
|
|
2016-05-10 16:49:49 +00:00
|
|
|
init_image_desc(p, fmt);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
|
2016-05-10 16:49:49 +00:00
|
|
|
p->use_integer_conversion = use_integer;
|
|
|
|
snprintf(p->color_swizzle, sizeof(p->color_swizzle), "%s", color_swizzle);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
bool gl_video_check_format(struct gl_video *p, int mp_format)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (init_format(p, mp_format, true))
|
|
|
|
return true;
|
|
|
|
if (p->hwdec && p->hwdec->driver->imgfmt == mp_format)
|
|
|
|
return true;
|
|
|
|
return false;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2013-06-07 23:35:44 +00:00
|
|
|
void gl_video_config(struct gl_video *p, struct mp_image_params *params)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-22 17:29:37 +00:00
|
|
|
mp_image_unrefp(&p->image.mpi);
|
2013-11-05 18:08:44 +00:00
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
if (!mp_image_params_equal(&p->real_image_params, params)) {
|
2013-11-05 18:08:44 +00:00
|
|
|
uninit_video(p);
|
2015-01-29 18:53:49 +00:00
|
|
|
p->real_image_params = *params;
|
|
|
|
p->image_params = *params;
|
2014-12-09 20:36:45 +00:00
|
|
|
if (params->imgfmt)
|
2015-01-29 18:53:49 +00:00
|
|
|
init_video(p);
|
2013-11-05 18:08:44 +00:00
|
|
|
}
|
2014-11-07 14:28:12 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
gl_video_reset_surfaces(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-03-23 15:32:59 +00:00
|
|
|
void gl_video_set_osd_source(struct gl_video *p, struct osd_state *osd)
|
|
|
|
{
|
|
|
|
mpgl_osd_destroy(p->osd);
|
|
|
|
p->osd = NULL;
|
|
|
|
p->osd_state = osd;
|
|
|
|
recreate_osd(p);
|
|
|
|
}
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
struct gl_video *gl_video_init(GL *gl, struct mp_log *log, struct mpv_global *g,
|
|
|
|
struct gl_lcms *cms)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-21 19:32:42 +00:00
|
|
|
if (gl->version < 210 && gl->es < 200) {
|
2014-12-22 11:49:20 +00:00
|
|
|
mp_err(log, "At least OpenGL 2.1 or OpenGL ES 2.0 required.\n");
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct gl_video *p = talloc_ptrtype(NULL, p);
|
|
|
|
*p = (struct gl_video) {
|
|
|
|
.gl = gl,
|
2015-03-27 12:27:40 +00:00
|
|
|
.global = g,
|
2013-07-31 19:44:21 +00:00
|
|
|
.log = log,
|
2016-02-13 14:33:00 +00:00
|
|
|
.cms = cms,
|
2013-03-01 20:19:20 +00:00
|
|
|
.opts = gl_video_opts_def,
|
2014-12-24 15:54:47 +00:00
|
|
|
.texture_16bit_depth = 16,
|
2015-09-23 20:13:03 +00:00
|
|
|
.sc = gl_sc_create(gl, log),
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++)
|
|
|
|
p->scaler[n] = (struct scaler){.index = n};
|
2014-12-23 01:46:44 +00:00
|
|
|
gl_video_set_debug(p, true);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
init_gl(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
recreate_osd(p);
|
|
|
|
return p;
|
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Get static string for scaler shader. If "tscale" is set to true, the
|
|
|
|
// scaler must be a separable convolution filter.
|
|
|
|
static const char *handle_scaler_opt(const char *name, bool tscale)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-26 01:03:44 +00:00
|
|
|
if (name && name[0]) {
|
2013-03-01 20:19:20 +00:00
|
|
|
const struct filter_kernel *kernel = mp_find_filter_kernel(name);
|
2015-03-13 18:30:31 +00:00
|
|
|
if (kernel && (!tscale || !kernel->polar))
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
return kernel->f.name;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-03-15 06:11:51 +00:00
|
|
|
for (const char *const *filter = tscale ? fixed_tscale_filters
|
|
|
|
: fixed_scale_filters;
|
|
|
|
*filter; filter++) {
|
|
|
|
if (strcmp(*filter, name) == 0)
|
2013-03-01 20:19:20 +00:00
|
|
|
return *filter;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2015-06-09 20:30:32 +00:00
|
|
|
static char **dup_str_array(void *parent, char **src)
|
|
|
|
{
|
|
|
|
if (!src)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
char **res = talloc_new(parent);
|
|
|
|
int num = 0;
|
|
|
|
for (int n = 0; src && src[n]; n++)
|
|
|
|
MP_TARRAY_APPEND(res, res, num, talloc_strdup(res, src[n]));
|
|
|
|
MP_TARRAY_APPEND(res, res, num, NULL);
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
static void assign_options(struct gl_video_opts *dst, struct gl_video_opts *src)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-09-08 20:46:36 +00:00
|
|
|
talloc_free(dst->scale_shader);
|
|
|
|
talloc_free(dst->pre_shaders);
|
|
|
|
talloc_free(dst->post_shaders);
|
2016-04-20 23:33:13 +00:00
|
|
|
talloc_free(dst->user_shaders);
|
2015-09-09 18:40:04 +00:00
|
|
|
talloc_free(dst->deband_opts);
|
2015-10-26 22:43:48 +00:00
|
|
|
talloc_free(dst->superxbr_opts);
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
talloc_free(dst->nnedi3_opts);
|
2015-06-09 20:30:32 +00:00
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
*dst = *src;
|
2015-06-09 20:30:32 +00:00
|
|
|
|
2015-09-09 18:40:04 +00:00
|
|
|
if (src->deband_opts)
|
|
|
|
dst->deband_opts = m_sub_options_copy(NULL, &deband_conf, src->deband_opts);
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
if (src->superxbr_opts) {
|
|
|
|
dst->superxbr_opts = m_sub_options_copy(NULL, &superxbr_conf,
|
|
|
|
src->superxbr_opts);
|
|
|
|
}
|
|
|
|
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
if (src->nnedi3_opts) {
|
|
|
|
dst->nnedi3_opts = m_sub_options_copy(NULL, &nnedi3_conf,
|
|
|
|
src->nnedi3_opts);
|
|
|
|
}
|
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++) {
|
2015-09-08 20:46:36 +00:00
|
|
|
dst->scaler[n].kernel.name =
|
2016-03-05 08:42:57 +00:00
|
|
|
(char *)handle_scaler_opt(dst->scaler[n].kernel.name,
|
|
|
|
n == SCALER_TSCALE);
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
}
|
2015-03-13 18:30:31 +00:00
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
dst->scale_shader = talloc_strdup(NULL, dst->scale_shader);
|
|
|
|
dst->pre_shaders = dup_str_array(NULL, dst->pre_shaders);
|
|
|
|
dst->post_shaders = dup_str_array(NULL, dst->post_shaders);
|
2016-04-20 23:33:13 +00:00
|
|
|
dst->user_shaders = dup_str_array(NULL, dst->user_shaders);
|
2015-09-08 20:46:36 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Set the options, and possibly update the filter chain too.
|
|
|
|
// Note: assumes all options are valid and verified by the option parser.
|
|
|
|
void gl_video_set_options(struct gl_video *p, struct gl_video_opts *opts)
|
|
|
|
{
|
|
|
|
assign_options(&p->opts, opts);
|
2015-07-16 20:43:40 +00:00
|
|
|
|
|
|
|
check_gl_features(p);
|
|
|
|
uninit_rendering(p);
|
2015-11-28 18:59:11 +00:00
|
|
|
|
|
|
|
if (p->opts.interpolation && !p->global->opts->video_sync && !p->dsi_warned) {
|
|
|
|
MP_WARN(p, "Interpolation now requires enabling display-sync mode.\n"
|
|
|
|
"E.g.: --video-sync=display-resample\n");
|
|
|
|
p->dsi_warned = true;
|
|
|
|
}
|
2015-07-16 20:43:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_configure_queue(struct gl_video *p, struct vo *vo)
|
|
|
|
{
|
2015-07-20 19:12:46 +00:00
|
|
|
int queue_size = 1;
|
2015-07-16 20:43:40 +00:00
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Figure out an adequate size for the interpolation queue. The larger
|
2015-06-26 08:59:57 +00:00
|
|
|
// the radius, the earlier we need to queue frames.
|
2015-07-16 20:43:40 +00:00
|
|
|
if (p->opts.interpolation) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const struct filter_kernel *kernel =
|
2016-03-05 08:42:57 +00:00
|
|
|
mp_find_filter_kernel(p->opts.scaler[SCALER_TSCALE].kernel.name);
|
2015-03-13 18:30:31 +00:00
|
|
|
if (kernel) {
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
double radius = kernel->f.radius;
|
2016-03-05 08:42:57 +00:00
|
|
|
radius = radius > 0 ? radius : p->opts.scaler[SCALER_TSCALE].radius;
|
2015-07-20 19:12:46 +00:00
|
|
|
queue_size += 1 + ceil(radius);
|
2015-07-11 11:55:45 +00:00
|
|
|
} else {
|
|
|
|
// Oversample case
|
2015-07-20 19:12:46 +00:00
|
|
|
queue_size += 2;
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-11-25 21:10:55 +00:00
|
|
|
vo_set_queue_params(vo, 0, queue_size);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-01-06 16:34:29 +00:00
|
|
|
struct mp_csp_equalizer *gl_video_eq_ptr(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-06 16:34:29 +00:00
|
|
|
return &p->video_eq;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-01-06 16:34:29 +00:00
|
|
|
// Call when the mp_csp_equalizer returned by gl_video_eq_ptr() was changed.
|
|
|
|
void gl_video_eq_update(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-12-21 19:03:36 +00:00
|
|
|
static int validate_scaler_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-22 18:58:22 +00:00
|
|
|
char s[20] = {0};
|
|
|
|
int r = 1;
|
2015-03-13 18:30:31 +00:00
|
|
|
bool tscale = bstr_equals0(name, "tscale");
|
2013-07-22 00:14:15 +00:00
|
|
|
if (bstr_equals0(param, "help")) {
|
2015-01-22 18:58:22 +00:00
|
|
|
r = M_OPT_EXIT - 1;
|
|
|
|
} else {
|
|
|
|
snprintf(s, sizeof(s), "%.*s", BSTR_P(param));
|
2015-03-13 18:30:31 +00:00
|
|
|
if (!handle_scaler_opt(s, tscale))
|
2015-01-22 18:58:22 +00:00
|
|
|
r = M_OPT_INVALID;
|
|
|
|
}
|
|
|
|
if (r < 1) {
|
2013-12-21 19:03:36 +00:00
|
|
|
mp_info(log, "Available scalers:\n");
|
2015-03-15 06:11:51 +00:00
|
|
|
for (const char *const *filter = tscale ? fixed_tscale_filters
|
|
|
|
: fixed_scale_filters;
|
|
|
|
*filter; filter++) {
|
|
|
|
mp_info(log, " %s\n", *filter);
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
for (int n = 0; mp_filter_kernels[n].f.name; n++) {
|
2015-03-13 18:30:31 +00:00
|
|
|
if (!tscale || !mp_filter_kernels[n].polar)
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
mp_info(log, " %s\n", mp_filter_kernels[n].f.name);
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
2015-01-22 18:58:22 +00:00
|
|
|
if (s[0])
|
|
|
|
mp_fatal(log, "No scaler named '%s' found!\n", s);
|
2013-07-22 00:14:15 +00:00
|
|
|
}
|
2015-01-22 18:58:22 +00:00
|
|
|
return r;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2013-03-15 19:17:33 +00:00
|
|
|
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
static int validate_window_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param)
|
|
|
|
{
|
|
|
|
char s[20] = {0};
|
|
|
|
int r = 1;
|
|
|
|
if (bstr_equals0(param, "help")) {
|
|
|
|
r = M_OPT_EXIT - 1;
|
|
|
|
} else {
|
|
|
|
snprintf(s, sizeof(s), "%.*s", BSTR_P(param));
|
|
|
|
const struct filter_window *window = mp_find_filter_window(s);
|
|
|
|
if (!window)
|
|
|
|
r = M_OPT_INVALID;
|
|
|
|
}
|
|
|
|
if (r < 1) {
|
|
|
|
mp_info(log, "Available windows:\n");
|
|
|
|
for (int n = 0; mp_filter_windows[n].name; n++)
|
|
|
|
mp_info(log, " %s\n", mp_filter_windows[n].name);
|
|
|
|
if (s[0])
|
|
|
|
mp_fatal(log, "No window named '%s' found!\n", s);
|
|
|
|
}
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2015-02-07 12:54:18 +00:00
|
|
|
float gl_video_scale_ambient_lux(float lmin, float lmax,
|
|
|
|
float rmin, float rmax, float lux)
|
|
|
|
{
|
|
|
|
assert(lmax > lmin);
|
|
|
|
|
|
|
|
float num = (rmax - rmin) * (log10(lux) - log10(lmin));
|
|
|
|
float den = log10(lmax) - log10(lmin);
|
|
|
|
float result = num / den + rmin;
|
|
|
|
|
|
|
|
// clamp the result
|
|
|
|
float max = MPMAX(rmax, rmin);
|
|
|
|
float min = MPMIN(rmax, rmin);
|
|
|
|
return MPMAX(MPMIN(result, max), min);
|
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_set_ambient_lux(struct gl_video *p, int lux)
|
|
|
|
{
|
|
|
|
if (p->opts.gamma_auto) {
|
|
|
|
float gamma = gl_video_scale_ambient_lux(16.0, 64.0, 2.40, 1.961, lux);
|
2015-03-04 19:18:14 +00:00
|
|
|
MP_VERBOSE(p, "ambient light changed: %dlux (gamma: %f)\n", lux, gamma);
|
2015-02-07 12:54:18 +00:00
|
|
|
p->opts.gamma = MPMIN(1.0, 1.961 / gamma);
|
|
|
|
gl_video_eq_update(p);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
void gl_video_set_hwdec(struct gl_video *p, struct gl_hwdec *hwdec)
|
|
|
|
{
|
|
|
|
p->hwdec = hwdec;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
2013-11-03 23:00:18 +00:00
|
|
|
}
|