2013-03-01 20:19:20 +00:00
|
|
|
/*
|
|
|
|
* This file is part of mpv.
|
|
|
|
*
|
2016-01-19 17:36:34 +00:00
|
|
|
* mpv is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
|
|
* License as published by the Free Software Foundation; either
|
|
|
|
* version 2.1 of the License, or (at your option) any later version.
|
2013-03-01 20:19:20 +00:00
|
|
|
*
|
|
|
|
* mpv is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
2016-01-19 17:36:34 +00:00
|
|
|
* GNU Lesser General Public License for more details.
|
2013-03-01 20:19:20 +00:00
|
|
|
*
|
2016-01-19 17:36:34 +00:00
|
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
|
|
* License along with mpv. If not, see <http://www.gnu.org/licenses/>.
|
2013-03-01 20:19:20 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <assert.h>
|
2013-03-30 03:01:17 +00:00
|
|
|
#include <math.h>
|
|
|
|
#include <stdbool.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <assert.h>
|
|
|
|
|
|
|
|
#include <libavutil/common.h>
|
2015-03-27 12:27:40 +00:00
|
|
|
#include <libavutil/lfg.h>
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-08-29 02:12:56 +00:00
|
|
|
#include "video.h"
|
2013-03-30 03:01:17 +00:00
|
|
|
|
2014-08-29 10:09:04 +00:00
|
|
|
#include "misc/bstr.h"
|
2015-09-09 18:40:04 +00:00
|
|
|
#include "options/m_config.h"
|
2015-11-28 18:59:11 +00:00
|
|
|
#include "common/global.h"
|
|
|
|
#include "options/options.h"
|
2015-08-29 02:12:56 +00:00
|
|
|
#include "common.h"
|
2016-05-12 18:08:49 +00:00
|
|
|
#include "formats.h"
|
2015-08-29 02:12:56 +00:00
|
|
|
#include "utils.h"
|
|
|
|
#include "hwdec.h"
|
|
|
|
#include "osd.h"
|
2015-09-23 20:13:03 +00:00
|
|
|
#include "stream/stream.h"
|
2015-10-26 22:43:48 +00:00
|
|
|
#include "superxbr.h"
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
#include "nnedi3.h"
|
2015-09-05 12:03:00 +00:00
|
|
|
#include "video_shaders.h"
|
2015-08-28 23:10:30 +00:00
|
|
|
#include "video/out/filter_kernels.h"
|
|
|
|
#include "video/out/aspect.h"
|
|
|
|
#include "video/out/bitmap_packer.h"
|
|
|
|
#include "video/out/dither.h"
|
|
|
|
#include "video/out/vo.h"
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
// Maximal number of passes that prescaler can be applied.
|
|
|
|
#define MAX_PRESCALE_PASSES 5
|
|
|
|
|
|
|
|
// Maximal number of steps each pass of prescaling contains
|
|
|
|
#define MAX_PRESCALE_STEPS 2
|
|
|
|
|
2015-01-20 20:46:19 +00:00
|
|
|
// scale/cscale arguments that map directly to shader filter routines.
|
2013-03-01 20:19:20 +00:00
|
|
|
// Note that the convolution filters are not included in this list.
|
2014-06-10 21:56:05 +00:00
|
|
|
static const char *const fixed_scale_filters[] = {
|
2013-03-01 20:19:20 +00:00
|
|
|
"bilinear",
|
|
|
|
"bicubic_fast",
|
2015-03-15 05:27:11 +00:00
|
|
|
"oversample",
|
2015-03-27 12:27:40 +00:00
|
|
|
"custom",
|
2013-03-01 20:19:20 +00:00
|
|
|
NULL
|
|
|
|
};
|
2015-03-15 06:11:51 +00:00
|
|
|
static const char *const fixed_tscale_filters[] = {
|
2015-07-11 11:55:45 +00:00
|
|
|
"oversample",
|
2015-03-15 06:11:51 +00:00
|
|
|
NULL
|
|
|
|
};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
// must be sorted, and terminated with 0
|
2014-12-08 16:08:26 +00:00
|
|
|
int filter_sizes[] =
|
|
|
|
{2, 4, 6, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 0};
|
2015-03-13 18:30:31 +00:00
|
|
|
int tscale_sizes[] = {2, 4, 6, 0}; // limited by TEXUNIT_VIDEO_NUM
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct vertex_pt {
|
|
|
|
float x, y;
|
|
|
|
};
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct vertex {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct vertex_pt position;
|
2015-03-13 12:42:05 +00:00
|
|
|
struct vertex_pt texcoord[TEXUNIT_VIDEO_NUM];
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
static const struct gl_vao_entry vertex_vao[] = {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{"position", 2, GL_FLOAT, false, offsetof(struct vertex, position)},
|
|
|
|
{"texcoord0", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[0])},
|
|
|
|
{"texcoord1", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[1])},
|
|
|
|
{"texcoord2", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[2])},
|
|
|
|
{"texcoord3", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[3])},
|
2015-03-13 12:42:05 +00:00
|
|
|
{"texcoord4", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[4])},
|
|
|
|
{"texcoord5", 2, GL_FLOAT, false, offsetof(struct vertex, texcoord[5])},
|
2015-01-28 21:22:29 +00:00
|
|
|
{0}
|
|
|
|
};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
struct texplane {
|
2013-03-28 19:40:19 +00:00
|
|
|
int w, h;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
int tex_w, tex_h;
|
2013-03-28 19:48:53 +00:00
|
|
|
GLint gl_internal_format;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLenum gl_target;
|
2016-01-26 19:47:32 +00:00
|
|
|
bool use_integer;
|
2013-03-28 19:48:53 +00:00
|
|
|
GLenum gl_format;
|
|
|
|
GLenum gl_type;
|
2013-03-01 20:19:20 +00:00
|
|
|
GLuint gl_texture;
|
|
|
|
int gl_buffer;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
char swizzle[5];
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image {
|
2013-03-28 20:02:53 +00:00
|
|
|
struct texplane planes[4];
|
2013-03-28 19:40:19 +00:00
|
|
|
bool image_flipped;
|
2015-01-22 17:29:37 +00:00
|
|
|
struct mp_image *mpi; // original input image
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
bool hwdec_mapped;
|
2013-03-28 19:40:19 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
enum plane_type {
|
|
|
|
PLANE_NONE = 0,
|
|
|
|
PLANE_RGB,
|
|
|
|
PLANE_LUMA,
|
|
|
|
PLANE_CHROMA,
|
|
|
|
PLANE_ALPHA,
|
|
|
|
PLANE_XYZ,
|
2014-11-23 19:06:05 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// A self-contained description of a source image which can be bound to a
|
|
|
|
// texture unit and sampled from. Contains metadata about how it's to be used
|
|
|
|
struct img_tex {
|
|
|
|
enum plane_type type; // must be set to something non-zero
|
|
|
|
int components; // number of relevant coordinates
|
|
|
|
float multiplier; // multiplier to be used when sampling
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLuint gl_tex;
|
|
|
|
GLenum gl_target;
|
2016-01-26 19:47:32 +00:00
|
|
|
bool use_integer;
|
2016-04-08 20:21:31 +00:00
|
|
|
int tex_w, tex_h; // source texture size
|
|
|
|
int w, h; // logical size (with pre_transform applied)
|
|
|
|
struct gl_transform pre_transform; // source texture space
|
|
|
|
struct gl_transform transform; // rendering transformation
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
char swizzle[5];
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct fbosurface {
|
|
|
|
struct fbotex fbotex;
|
|
|
|
double pts;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
};
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
#define FBOSURFACES_MAX 10
|
|
|
|
|
2015-09-23 20:13:03 +00:00
|
|
|
struct cached_file {
|
|
|
|
char *path;
|
|
|
|
char *body;
|
|
|
|
};
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct gl_video {
|
|
|
|
GL *gl;
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
struct mpv_global *global;
|
2013-07-31 19:44:21 +00:00
|
|
|
struct mp_log *log;
|
2013-03-01 20:19:20 +00:00
|
|
|
struct gl_video_opts opts;
|
2016-02-13 14:33:00 +00:00
|
|
|
struct gl_lcms *cms;
|
2013-03-01 20:19:20 +00:00
|
|
|
bool gl_debug;
|
|
|
|
|
2014-12-24 15:54:47 +00:00
|
|
|
int texture_16bit_depth; // actual bits available in 16 bit textures
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct gl_shader_cache *sc;
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
struct gl_vao vao;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2014-06-15 18:46:57 +00:00
|
|
|
struct osd_state *osd_state;
|
2013-03-01 20:19:20 +00:00
|
|
|
struct mpgl_osd *osd;
|
2014-06-15 18:46:57 +00:00
|
|
|
double osd_pts;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
GLuint lut_3d_texture;
|
|
|
|
bool use_lut_3d;
|
|
|
|
|
|
|
|
GLuint dither_texture;
|
|
|
|
int dither_size;
|
|
|
|
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
GLuint nnedi3_weights_buffer;
|
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
struct mp_image_params real_image_params; // configured format
|
|
|
|
struct mp_image_params image_params; // texture format (mind hwdec case)
|
2014-10-16 21:51:36 +00:00
|
|
|
struct mp_imgfmt_desc image_desc;
|
2015-01-29 14:50:21 +00:00
|
|
|
int plane_count;
|
2013-03-28 19:40:19 +00:00
|
|
|
|
2015-12-07 22:45:41 +00:00
|
|
|
bool is_yuv, is_packed_yuv;
|
2013-07-18 11:52:38 +00:00
|
|
|
bool has_alpha;
|
|
|
|
char color_swizzle[5];
|
2016-01-26 19:47:32 +00:00
|
|
|
bool use_integer_conversion;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image image;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
bool dumb_mode;
|
2016-01-26 19:47:32 +00:00
|
|
|
bool forced_dumb_mode;
|
2015-11-19 20:22:24 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct fbotex merge_fbo[4];
|
|
|
|
struct fbotex deband_fbo[4];
|
|
|
|
struct fbotex scale_fbo[4];
|
|
|
|
struct fbotex integer_fbo[4];
|
2015-03-27 12:27:40 +00:00
|
|
|
struct fbotex indirect_fbo;
|
2015-03-23 01:42:19 +00:00
|
|
|
struct fbotex blend_subs_fbo;
|
2015-09-23 20:43:27 +00:00
|
|
|
struct fbotex unsharp_fbo;
|
2015-09-05 10:02:02 +00:00
|
|
|
struct fbotex output_fbo;
|
2014-11-23 19:06:05 +00:00
|
|
|
struct fbosurface surfaces[FBOSURFACES_MAX];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
// these are duplicated so we can keep rendering back and forth between
|
|
|
|
// them to support an unlimited number of shader passes per step
|
|
|
|
struct fbotex pre_fbo[2];
|
|
|
|
struct fbotex post_fbo[2];
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
struct fbotex prescale_fbo[MAX_PRESCALE_PASSES][MAX_PRESCALE_STEPS];
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
int surface_idx;
|
|
|
|
int surface_now;
|
2015-07-02 11:17:20 +00:00
|
|
|
int frames_drawn;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
bool is_interpolated;
|
2015-09-05 10:02:02 +00:00
|
|
|
bool output_fbo_valid;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
// state for configured scalers
|
|
|
|
struct scaler scaler[SCALER_COUNT];
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
struct mp_csp_equalizer video_eq;
|
|
|
|
|
|
|
|
struct mp_rect src_rect; // displayed part of the source video
|
|
|
|
struct mp_rect dst_rect; // video rectangle on output window
|
|
|
|
struct mp_osd_res osd_rect; // OSD size/margins
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
int vp_w, vp_h;
|
|
|
|
|
|
|
|
// temporary during rendering
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex pass_tex[TEXUNIT_VIDEO_NUM];
|
|
|
|
int pass_tex_num;
|
2015-10-23 17:52:03 +00:00
|
|
|
int texture_w, texture_h;
|
2015-10-26 22:43:48 +00:00
|
|
|
struct gl_transform texture_offset; // texture transform without rotation
|
2016-03-05 11:38:51 +00:00
|
|
|
int components;
|
2015-03-15 21:52:34 +00:00
|
|
|
bool use_linear;
|
2015-03-15 23:09:36 +00:00
|
|
|
float user_gamma;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
int frames_uploaded;
|
2013-03-01 20:19:20 +00:00
|
|
|
int frames_rendered;
|
2015-03-27 12:27:40 +00:00
|
|
|
AVLFG lfg;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-05-25 23:48:39 +00:00
|
|
|
// Cached because computing it can take relatively long
|
|
|
|
int last_dither_matrix_size;
|
|
|
|
float *last_dither_matrix;
|
|
|
|
|
2015-09-23 20:13:03 +00:00
|
|
|
struct cached_file files[10];
|
|
|
|
int num_files;
|
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
struct gl_hwdec *hwdec;
|
|
|
|
bool hwdec_active;
|
2015-11-28 18:59:11 +00:00
|
|
|
|
|
|
|
bool dsi_warned;
|
2016-01-25 19:24:41 +00:00
|
|
|
bool custom_shader_fn_warned;
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
struct packed_fmt_entry {
|
|
|
|
int fmt;
|
|
|
|
int8_t component_size;
|
|
|
|
int8_t components[4]; // source component - 0 means unmapped
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct packed_fmt_entry mp_packed_formats[] = {
|
2015-01-21 18:29:18 +00:00
|
|
|
// w R G B A
|
2013-07-18 11:52:38 +00:00
|
|
|
{IMGFMT_Y8, 1, {1, 0, 0, 0}},
|
|
|
|
{IMGFMT_Y16, 2, {1, 0, 0, 0}},
|
|
|
|
{IMGFMT_YA8, 1, {1, 0, 0, 2}},
|
2015-01-21 18:29:18 +00:00
|
|
|
{IMGFMT_YA16, 2, {1, 0, 0, 2}},
|
2013-07-18 11:52:38 +00:00
|
|
|
{IMGFMT_ARGB, 1, {2, 3, 4, 1}},
|
|
|
|
{IMGFMT_0RGB, 1, {2, 3, 4, 0}},
|
|
|
|
{IMGFMT_BGRA, 1, {3, 2, 1, 4}},
|
|
|
|
{IMGFMT_BGR0, 1, {3, 2, 1, 0}},
|
|
|
|
{IMGFMT_ABGR, 1, {4, 3, 2, 1}},
|
|
|
|
{IMGFMT_0BGR, 1, {4, 3, 2, 0}},
|
|
|
|
{IMGFMT_RGBA, 1, {1, 2, 3, 4}},
|
|
|
|
{IMGFMT_RGB0, 1, {1, 2, 3, 0}},
|
|
|
|
{IMGFMT_BGR24, 1, {3, 2, 1, 0}},
|
|
|
|
{IMGFMT_RGB24, 1, {1, 2, 3, 0}},
|
|
|
|
{IMGFMT_RGB48, 2, {1, 2, 3, 0}},
|
|
|
|
{IMGFMT_RGBA64, 2, {1, 2, 3, 4}},
|
|
|
|
{IMGFMT_BGRA64, 2, {3, 2, 1, 4}},
|
|
|
|
{0},
|
|
|
|
};
|
2013-03-28 19:48:53 +00:00
|
|
|
|
2014-12-09 20:34:01 +00:00
|
|
|
const struct gl_video_opts gl_video_opts_def = {
|
2013-03-01 20:19:20 +00:00
|
|
|
.dither_depth = -1,
|
2013-05-25 23:48:39 +00:00
|
|
|
.dither_size = 6,
|
2015-07-20 17:09:22 +00:00
|
|
|
.temporal_dither_period = 1,
|
2015-11-19 20:20:50 +00:00
|
|
|
.fbo_format = 0,
|
2015-01-06 09:47:26 +00:00
|
|
|
.sigmoid_center = 0.75,
|
|
|
|
.sigmoid_slope = 6.5,
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
.scaler = {
|
2015-03-27 05:18:32 +00:00
|
|
|
{{"bilinear", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // scale
|
|
|
|
{{NULL, .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // dscale
|
|
|
|
{{"bilinear", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // cscale
|
2015-11-29 12:04:01 +00:00
|
|
|
{{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}},
|
|
|
|
.clamp = 1, }, // tscale
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
},
|
2016-01-25 20:35:39 +00:00
|
|
|
.scaler_resizes_only = 1,
|
2015-12-06 16:22:41 +00:00
|
|
|
.scaler_lut_size = 6,
|
2016-01-27 20:07:17 +00:00
|
|
|
.interpolation_threshold = 0.0001,
|
2015-12-22 22:14:47 +00:00
|
|
|
.alpha_mode = 3,
|
2014-12-09 20:34:01 +00:00
|
|
|
.background = {0, 0, 0, 255},
|
2015-02-03 16:12:04 +00:00
|
|
|
.gamma = 1.0f,
|
2015-10-26 22:43:48 +00:00
|
|
|
.prescale_passes = 1,
|
|
|
|
.prescale_downscaling_threshold = 2.0f,
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
|
|
|
|
2013-10-24 20:20:16 +00:00
|
|
|
const struct gl_video_opts gl_video_opts_hq_def = {
|
|
|
|
.dither_depth = 0,
|
|
|
|
.dither_size = 6,
|
2015-07-20 17:09:22 +00:00
|
|
|
.temporal_dither_period = 1,
|
2015-11-19 20:20:50 +00:00
|
|
|
.fbo_format = 0,
|
2015-11-07 16:49:14 +00:00
|
|
|
.correct_downscaling = 1,
|
2015-01-06 09:47:26 +00:00
|
|
|
.sigmoid_center = 0.75,
|
|
|
|
.sigmoid_slope = 6.5,
|
|
|
|
.sigmoid_upscaling = 1,
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
.scaler = {
|
2015-03-27 05:18:32 +00:00
|
|
|
{{"spline36", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // scale
|
|
|
|
{{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // dscale
|
|
|
|
{{"spline36", .params={NAN, NAN}}, {.params = {NAN, NAN}}}, // cscale
|
2015-11-29 12:04:01 +00:00
|
|
|
{{"mitchell", .params={NAN, NAN}}, {.params = {NAN, NAN}},
|
|
|
|
.clamp = 1, }, // tscale
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
},
|
2016-01-25 20:35:39 +00:00
|
|
|
.scaler_resizes_only = 1,
|
2015-12-06 16:22:41 +00:00
|
|
|
.scaler_lut_size = 6,
|
2016-01-27 20:07:17 +00:00
|
|
|
.interpolation_threshold = 0.0001,
|
2015-12-22 22:14:47 +00:00
|
|
|
.alpha_mode = 3,
|
2014-12-09 20:34:01 +00:00
|
|
|
.background = {0, 0, 0, 255},
|
2015-02-03 16:12:04 +00:00
|
|
|
.gamma = 1.0f,
|
2015-03-23 01:42:19 +00:00
|
|
|
.blend_subs = 0,
|
2015-09-05 15:39:27 +00:00
|
|
|
.deband = 1,
|
2015-10-26 22:43:48 +00:00
|
|
|
.prescale_passes = 1,
|
|
|
|
.prescale_downscaling_threshold = 2.0f,
|
2013-10-24 20:20:16 +00:00
|
|
|
};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-12-21 19:03:36 +00:00
|
|
|
static int validate_scaler_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
static int validate_window_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
#define OPT_BASE_STRUCT struct gl_video_opts
|
2015-08-29 01:24:15 +00:00
|
|
|
|
|
|
|
#define SCALER_OPTS(n, i) \
|
|
|
|
OPT_STRING_VALIDATE(n, scaler[i].kernel.name, 0, validate_scaler_opt), \
|
|
|
|
OPT_FLOAT(n"-param1", scaler[i].kernel.params[0], 0), \
|
|
|
|
OPT_FLOAT(n"-param2", scaler[i].kernel.params[1], 0), \
|
|
|
|
OPT_FLOAT(n"-blur", scaler[i].kernel.blur, 0), \
|
|
|
|
OPT_FLOAT(n"-wparam", scaler[i].window.params[0], 0), \
|
|
|
|
OPT_FLAG(n"-clamp", scaler[i].clamp, 0), \
|
|
|
|
OPT_FLOATRANGE(n"-radius", scaler[i].radius, 0, 0.5, 16.0), \
|
|
|
|
OPT_FLOATRANGE(n"-antiring", scaler[i].antiring, 0, 0.0, 1.0), \
|
|
|
|
OPT_STRING_VALIDATE(n"-window", scaler[i].window.name, 0, validate_window_opt)
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
const struct m_sub_options gl_video_conf = {
|
2014-06-10 21:56:05 +00:00
|
|
|
.opts = (const m_option_t[]) {
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
OPT_FLAG("dumb-mode", dumb_mode, 0),
|
2015-02-03 16:12:04 +00:00
|
|
|
OPT_FLOATRANGE("gamma", gamma, 0, 0.1, 2.0),
|
2015-02-07 12:54:18 +00:00
|
|
|
OPT_FLAG("gamma-auto", gamma_auto, 0),
|
2015-03-31 05:31:35 +00:00
|
|
|
OPT_CHOICE_C("target-prim", target_prim, 0, mp_csp_prim_names),
|
|
|
|
OPT_CHOICE_C("target-trc", target_trc, 0, mp_csp_trc_names),
|
2013-03-01 20:19:20 +00:00
|
|
|
OPT_FLAG("pbo", pbo, 0),
|
2016-03-05 08:42:57 +00:00
|
|
|
SCALER_OPTS("scale", SCALER_SCALE),
|
|
|
|
SCALER_OPTS("dscale", SCALER_DSCALE),
|
|
|
|
SCALER_OPTS("cscale", SCALER_CSCALE),
|
|
|
|
SCALER_OPTS("tscale", SCALER_TSCALE),
|
2015-12-05 19:14:23 +00:00
|
|
|
OPT_INTRANGE("scaler-lut-size", scaler_lut_size, 0, 4, 10),
|
2013-05-25 21:47:55 +00:00
|
|
|
OPT_FLAG("scaler-resizes-only", scaler_resizes_only, 0),
|
2015-02-06 02:37:21 +00:00
|
|
|
OPT_FLAG("linear-scaling", linear_scaling, 0),
|
2015-11-07 16:49:14 +00:00
|
|
|
OPT_FLAG("correct-downscaling", correct_downscaling, 0),
|
2015-01-06 09:47:26 +00:00
|
|
|
OPT_FLAG("sigmoid-upscaling", sigmoid_upscaling, 0),
|
|
|
|
OPT_FLOATRANGE("sigmoid-center", sigmoid_center, 0, 0.0, 1.0),
|
|
|
|
OPT_FLOATRANGE("sigmoid-slope", sigmoid_slope, 0, 1.0, 20.0),
|
2013-03-01 20:19:20 +00:00
|
|
|
OPT_CHOICE("fbo-format", fbo_format, 0,
|
|
|
|
({"rgb", GL_RGB},
|
|
|
|
{"rgba", GL_RGBA},
|
|
|
|
{"rgb8", GL_RGB8},
|
2015-11-19 13:45:06 +00:00
|
|
|
{"rgba8", GL_RGBA8},
|
2013-03-01 20:19:20 +00:00
|
|
|
{"rgb10", GL_RGB10},
|
2013-10-23 15:46:57 +00:00
|
|
|
{"rgb10_a2", GL_RGB10_A2},
|
2013-03-01 20:19:20 +00:00
|
|
|
{"rgb16", GL_RGB16},
|
|
|
|
{"rgb16f", GL_RGB16F},
|
2013-03-28 20:44:33 +00:00
|
|
|
{"rgb32f", GL_RGB32F},
|
|
|
|
{"rgba12", GL_RGBA12},
|
|
|
|
{"rgba16", GL_RGBA16},
|
|
|
|
{"rgba16f", GL_RGBA16F},
|
2015-11-19 20:20:50 +00:00
|
|
|
{"rgba32f", GL_RGBA32F},
|
|
|
|
{"auto", 0})),
|
2013-03-28 20:39:17 +00:00
|
|
|
OPT_CHOICE_OR_INT("dither-depth", dither_depth, 0, -1, 16,
|
|
|
|
({"no", -1}, {"auto", 0})),
|
2013-05-25 23:48:39 +00:00
|
|
|
OPT_CHOICE("dither", dither_algo, 0,
|
|
|
|
({"fruit", 0}, {"ordered", 1}, {"no", -1})),
|
|
|
|
OPT_INTRANGE("dither-size-fruit", dither_size, 0, 2, 8),
|
|
|
|
OPT_FLAG("temporal-dither", temporal_dither, 0),
|
2015-07-20 17:09:22 +00:00
|
|
|
OPT_INTRANGE("temporal-dither-period", temporal_dither_period, 0, 1, 128),
|
2015-02-27 17:31:24 +00:00
|
|
|
OPT_CHOICE("alpha", alpha_mode, 0,
|
2013-09-19 14:55:56 +00:00
|
|
|
({"no", 0},
|
2015-02-27 17:31:24 +00:00
|
|
|
{"yes", 1},
|
2015-12-22 22:14:47 +00:00
|
|
|
{"blend", 2},
|
|
|
|
{"blend-tiles", 3})),
|
2013-12-01 22:39:13 +00:00
|
|
|
OPT_FLAG("rectangle-textures", use_rectangle, 0),
|
2014-12-09 20:34:01 +00:00
|
|
|
OPT_COLOR("background", background, 0),
|
2015-03-13 18:30:31 +00:00
|
|
|
OPT_FLAG("interpolation", interpolation, 0),
|
2016-01-27 20:07:17 +00:00
|
|
|
OPT_FLOAT("interpolation-threshold", interpolation_threshold, 0),
|
2015-04-11 17:16:34 +00:00
|
|
|
OPT_CHOICE("blend-subtitles", blend_subs, 0,
|
|
|
|
({"no", 0},
|
|
|
|
{"yes", 1},
|
|
|
|
{"video", 2})),
|
2015-03-27 12:27:40 +00:00
|
|
|
OPT_STRING("scale-shader", scale_shader, 0),
|
|
|
|
OPT_STRINGLIST("pre-shaders", pre_shaders, 0),
|
|
|
|
OPT_STRINGLIST("post-shaders", post_shaders, 0),
|
2015-09-05 15:39:27 +00:00
|
|
|
OPT_FLAG("deband", deband, 0),
|
|
|
|
OPT_SUBSTRUCT("deband", deband_opts, deband_conf, 0),
|
2015-09-23 20:43:27 +00:00
|
|
|
OPT_FLOAT("sharpen", unsharp, 0),
|
2016-03-05 11:02:01 +00:00
|
|
|
OPT_CHOICE("prescale-luma", prescale_luma, 0,
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
({"none", 0},
|
2015-12-03 08:32:40 +00:00
|
|
|
{"superxbr", 1}
|
|
|
|
#if HAVE_NNEDI
|
|
|
|
, {"nnedi3", 2}
|
|
|
|
#endif
|
|
|
|
)),
|
2015-10-26 22:43:48 +00:00
|
|
|
OPT_INTRANGE("prescale-passes",
|
|
|
|
prescale_passes, 0, 1, MAX_PRESCALE_PASSES),
|
|
|
|
OPT_FLOATRANGE("prescale-downscaling-threshold",
|
|
|
|
prescale_downscaling_threshold, 0, 0.0, 32.0),
|
|
|
|
OPT_SUBSTRUCT("superxbr", superxbr_opts, superxbr_conf, 0),
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
OPT_SUBSTRUCT("nnedi3", nnedi3_opts, nnedi3_conf, 0),
|
2015-03-23 01:42:19 +00:00
|
|
|
|
2015-01-13 23:45:31 +00:00
|
|
|
OPT_REMOVED("approx-gamma", "this is always enabled now"),
|
2015-01-20 20:46:19 +00:00
|
|
|
OPT_REMOVED("cscale-down", "chroma is never downscaled"),
|
2015-01-22 17:24:50 +00:00
|
|
|
OPT_REMOVED("scale-sep", "this is set automatically whenever sane"),
|
|
|
|
OPT_REMOVED("indirect", "this is set automatically whenever sane"),
|
2015-03-16 09:17:22 +00:00
|
|
|
OPT_REMOVED("srgb", "use target-prim=bt709:target-trc=srgb instead"),
|
2015-09-05 15:39:27 +00:00
|
|
|
OPT_REMOVED("source-shader", "use :deband to enable debanding"),
|
2015-01-20 20:46:19 +00:00
|
|
|
|
|
|
|
OPT_REPLACED("lscale", "scale"),
|
|
|
|
OPT_REPLACED("lscale-down", "scale-down"),
|
|
|
|
OPT_REPLACED("lparam1", "scale-param1"),
|
|
|
|
OPT_REPLACED("lparam2", "scale-param2"),
|
|
|
|
OPT_REPLACED("lradius", "scale-radius"),
|
|
|
|
OPT_REPLACED("lantiring", "scale-antiring"),
|
|
|
|
OPT_REPLACED("cparam1", "cscale-param1"),
|
|
|
|
OPT_REPLACED("cparam2", "cscale-param2"),
|
|
|
|
OPT_REPLACED("cradius", "cscale-radius"),
|
|
|
|
OPT_REPLACED("cantiring", "cscale-antiring"),
|
2015-03-13 18:30:31 +00:00
|
|
|
OPT_REPLACED("smoothmotion", "interpolation"),
|
2015-03-15 06:11:51 +00:00
|
|
|
OPT_REPLACED("smoothmotion-threshold", "tscale-param1"),
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
OPT_REPLACED("scale-down", "dscale"),
|
2015-11-07 16:49:14 +00:00
|
|
|
OPT_REPLACED("fancy-downscaling", "correct-downscaling"),
|
2016-03-05 11:02:01 +00:00
|
|
|
OPT_REPLACED("prescale", "prescale-luma"),
|
2015-01-20 20:46:19 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
{0}
|
|
|
|
},
|
|
|
|
.size = sizeof(struct gl_video_opts),
|
|
|
|
.defaults = &gl_video_opts_def,
|
|
|
|
};
|
|
|
|
|
|
|
|
static void uninit_rendering(struct gl_video *p);
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
static void uninit_scaler(struct gl_video *p, struct scaler *scaler);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
static void check_gl_features(struct gl_video *p);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
static bool init_format(struct gl_video *p, int fmt, bool test_only);
|
|
|
|
static void init_image_desc(struct gl_video *p, int fmt);
|
2015-07-15 10:22:49 +00:00
|
|
|
static void gl_video_upload_image(struct gl_video *p, struct mp_image *mpi);
|
2015-09-08 20:46:36 +00:00
|
|
|
static void assign_options(struct gl_video_opts *dst, struct gl_video_opts *src);
|
2016-04-08 20:21:31 +00:00
|
|
|
static void get_scale_factors(struct gl_video *p, bool transpose_rot, double xy[2]);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
#define GLSL(x) gl_sc_add(p->sc, #x "\n");
|
|
|
|
#define GLSLF(...) gl_sc_addf(p->sc, __VA_ARGS__)
|
2016-02-27 22:56:33 +00:00
|
|
|
#define GLSLHF(...) gl_sc_haddf(p->sc, __VA_ARGS__)
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-09-23 20:13:03 +00:00
|
|
|
static const char *load_cached_file(struct gl_video *p, const char *path)
|
|
|
|
{
|
|
|
|
if (!path || !path[0])
|
|
|
|
return NULL;
|
|
|
|
for (int n = 0; n < p->num_files; n++) {
|
|
|
|
if (strcmp(p->files[n].path, path) == 0)
|
|
|
|
return p->files[n].body;
|
|
|
|
}
|
|
|
|
// not found -> load it
|
|
|
|
if (p->num_files == MP_ARRAY_SIZE(p->files)) {
|
|
|
|
// empty cache when it overflows
|
|
|
|
for (int n = 0; n < p->num_files; n++) {
|
|
|
|
talloc_free(p->files[n].path);
|
|
|
|
talloc_free(p->files[n].body);
|
|
|
|
}
|
|
|
|
p->num_files = 0;
|
|
|
|
}
|
|
|
|
struct bstr s = stream_read_file(path, p, p->global, 100000); // 100 kB
|
|
|
|
if (s.len) {
|
|
|
|
struct cached_file *new = &p->files[p->num_files++];
|
|
|
|
*new = (struct cached_file) {
|
|
|
|
.path = talloc_strdup(p, path),
|
|
|
|
.body = s.start
|
|
|
|
};
|
|
|
|
return new->body;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void debug_check_gl(struct gl_video *p, const char *msg)
|
|
|
|
{
|
|
|
|
if (p->gl_debug)
|
2013-09-11 22:57:32 +00:00
|
|
|
glCheckError(p->gl, p->log, msg);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_set_debug(struct gl_video *p, bool enable)
|
|
|
|
{
|
2014-12-23 01:46:44 +00:00
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
p->gl_debug = enable;
|
2015-01-30 10:12:58 +00:00
|
|
|
if (p->gl->debug_context)
|
|
|
|
gl_set_debug_logger(gl, enable ? p->log : NULL);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
static void gl_video_reset_surfaces(struct gl_video *p)
|
|
|
|
{
|
2015-11-28 14:45:35 +00:00
|
|
|
for (int i = 0; i < FBOSURFACES_MAX; i++)
|
2015-06-26 08:59:57 +00:00
|
|
|
p->surfaces[i].pts = MP_NOPTS_VALUE;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->surface_idx = 0;
|
|
|
|
p->surface_now = 0;
|
2015-07-02 11:17:20 +00:00
|
|
|
p->frames_drawn = 0;
|
2015-09-05 10:02:02 +00:00
|
|
|
p->output_fbo_valid = false;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
static inline int fbosurface_wrap(int id)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
2015-03-13 18:30:31 +00:00
|
|
|
id = id % FBOSURFACES_MAX;
|
|
|
|
return id < 0 ? id + FBOSURFACES_MAX : id;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void recreate_osd(struct gl_video *p)
|
|
|
|
{
|
2015-03-23 15:32:59 +00:00
|
|
|
mpgl_osd_destroy(p->osd);
|
|
|
|
p->osd = NULL;
|
|
|
|
if (p->osd_state) {
|
|
|
|
p->osd = mpgl_osd_init(p->gl, p->log, p->osd_state);
|
|
|
|
mpgl_osd_set_options(p->osd, p->opts.pbo);
|
|
|
|
}
|
2015-02-03 16:12:04 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void reinit_rendering(struct gl_video *p)
|
|
|
|
{
|
2013-07-31 19:44:21 +00:00
|
|
|
MP_VERBOSE(p, "Reinit rendering.\n");
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
debug_check_gl(p, "before scaler initialization");
|
|
|
|
|
|
|
|
uninit_rendering(p);
|
|
|
|
|
|
|
|
recreate_osd(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void uninit_rendering(struct gl_video *p)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++)
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
uninit_scaler(p, &p->scaler[n]);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
gl->DeleteTextures(1, &p->dither_texture);
|
|
|
|
p->dither_texture = 0;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
gl->DeleteBuffers(1, &p->nnedi3_weights_buffer);
|
2016-01-03 22:11:23 +00:00
|
|
|
p->nnedi3_weights_buffer = 0;
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
fbotex_uninit(&p->merge_fbo[n]);
|
|
|
|
fbotex_uninit(&p->deband_fbo[n]);
|
|
|
|
fbotex_uninit(&p->scale_fbo[n]);
|
|
|
|
fbotex_uninit(&p->integer_fbo[n]);
|
|
|
|
}
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
fbotex_uninit(&p->indirect_fbo);
|
2015-03-23 01:42:19 +00:00
|
|
|
fbotex_uninit(&p->blend_subs_fbo);
|
2015-09-23 20:43:27 +00:00
|
|
|
fbotex_uninit(&p->unsharp_fbo);
|
2016-01-26 19:47:32 +00:00
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
for (int n = 0; n < 2; n++) {
|
|
|
|
fbotex_uninit(&p->pre_fbo[n]);
|
|
|
|
fbotex_uninit(&p->post_fbo[n]);
|
|
|
|
}
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
for (int pass = 0; pass < MAX_PRESCALE_PASSES; pass++) {
|
|
|
|
for (int step = 0; step < MAX_PRESCALE_STEPS; step++)
|
|
|
|
fbotex_uninit(&p->prescale_fbo[pass][step]);
|
|
|
|
}
|
|
|
|
|
2015-03-13 23:32:20 +00:00
|
|
|
for (int n = 0; n < FBOSURFACES_MAX; n++)
|
|
|
|
fbotex_uninit(&p->surfaces[n].fbotex);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
gl_video_reset_surfaces(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
void gl_video_update_profile(struct gl_video *p)
|
|
|
|
{
|
|
|
|
if (p->use_lut_3d)
|
|
|
|
return;
|
|
|
|
|
|
|
|
p->use_lut_3d = true;
|
|
|
|
check_gl_features(p);
|
|
|
|
|
|
|
|
reinit_rendering(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool gl_video_get_lut3d(struct gl_video *p, enum mp_csp_prim prim,
|
|
|
|
enum mp_csp_trc trc)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
if (!p->cms || !p->use_lut_3d)
|
|
|
|
return false;
|
2014-03-24 22:30:12 +00:00
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
if (!gl_lcms_has_changed(p->cms, prim, trc))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
struct lut3d *lut3d = NULL;
|
|
|
|
if (!gl_lcms_get_lut3d(p->cms, &lut3d, prim, trc) || !lut3d) {
|
|
|
|
return false;
|
2015-11-19 20:21:04 +00:00
|
|
|
}
|
2014-12-23 01:48:58 +00:00
|
|
|
|
2014-03-24 22:30:12 +00:00
|
|
|
if (!p->lut_3d_texture)
|
|
|
|
gl->GenTextures(1, &p->lut_3d_texture);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + TEXUNIT_3DLUT);
|
|
|
|
gl->BindTexture(GL_TEXTURE_3D, p->lut_3d_texture);
|
|
|
|
gl->TexImage3D(GL_TEXTURE_3D, 0, GL_RGB16, lut3d->size[0], lut3d->size[1],
|
|
|
|
lut3d->size[2], 0, GL_RGB, GL_UNSIGNED_SHORT, lut3d->data);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
|
|
|
|
|
|
|
debug_check_gl(p, "after 3d lut creation");
|
2014-03-24 22:30:12 +00:00
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
return true;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Fill an img_tex struct from an FBO + some metadata
|
|
|
|
static struct img_tex img_tex_fbo(struct fbotex *fbo, struct gl_transform t,
|
|
|
|
enum plane_type type, int components)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
assert(type != PLANE_NONE);
|
|
|
|
return (struct img_tex){
|
|
|
|
.type = type,
|
|
|
|
.gl_tex = fbo->texture,
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
.gl_target = GL_TEXTURE_2D,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.multiplier = 1.0,
|
|
|
|
.use_integer = false,
|
|
|
|
.tex_w = fbo->rw,
|
|
|
|
.tex_h = fbo->rh,
|
|
|
|
.w = fbo->lw,
|
|
|
|
.h = fbo->lh,
|
2016-04-08 20:21:31 +00:00
|
|
|
.pre_transform = identity_trans,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.transform = t,
|
|
|
|
.components = components,
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
};
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Bind an img_tex to a free texture unit and return its ID. At most
|
|
|
|
// TEXUNIT_VIDEO_NUM texture units can be bound at once
|
|
|
|
static int pass_bind(struct gl_video *p, struct img_tex tex)
|
2013-03-28 19:40:19 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
assert(p->pass_tex_num < TEXUNIT_VIDEO_NUM);
|
|
|
|
p->pass_tex[p->pass_tex_num] = tex;
|
|
|
|
return p->pass_tex_num++;
|
|
|
|
}
|
2013-03-28 19:40:19 +00:00
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
// Rotation by 90° and flipping.
|
|
|
|
static void get_plane_source_transform(struct gl_video *p, int w, int h,
|
|
|
|
struct gl_transform *out_tr)
|
|
|
|
{
|
|
|
|
struct gl_transform tr = identity_trans;
|
|
|
|
int a = p->image_params.rotate % 90 ? 0 : p->image_params.rotate / 90;
|
|
|
|
int sin90[4] = {0, 1, 0, -1}; // just to avoid rounding issues etc.
|
|
|
|
int cos90[4] = {1, 0, -1, 0};
|
|
|
|
struct gl_transform rot = {{{cos90[a], sin90[a]}, {-sin90[a], cos90[a]}}};
|
|
|
|
gl_transform_trans(rot, &tr);
|
|
|
|
|
|
|
|
// basically, recenter to keep the whole image in view
|
|
|
|
float b[2] = {1, 1};
|
|
|
|
gl_transform_vec(rot, &b[0], &b[1]);
|
|
|
|
tr.t[0] += b[0] < 0 ? w : 0;
|
|
|
|
tr.t[1] += b[1] < 0 ? h : 0;
|
|
|
|
|
|
|
|
if (p->image.image_flipped) {
|
|
|
|
struct gl_transform flip = {{{1, 0}, {0, -1}}, {0, h}};
|
|
|
|
gl_transform_trans(flip, &tr);
|
|
|
|
}
|
|
|
|
|
|
|
|
*out_tr = tr;
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Places a video_image's image textures + associated metadata into tex[]. The
|
|
|
|
// number of textures is equal to p->plane_count.
|
|
|
|
static void pass_get_img_tex(struct gl_video *p, struct video_image *vimg,
|
|
|
|
struct img_tex tex[4])
|
|
|
|
{
|
2015-01-22 17:29:37 +00:00
|
|
|
assert(vimg->mpi);
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Determine the chroma offset
|
|
|
|
struct gl_transform chroma = (struct gl_transform){{{0}}};
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
float ls_w = 1.0 / (1 << p->image_desc.chroma_xs);
|
|
|
|
float ls_h = 1.0 / (1 << p->image_desc.chroma_ys);
|
|
|
|
|
2015-04-02 21:59:50 +00:00
|
|
|
if (p->image_params.chroma_location != MP_CHROMA_CENTER) {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
int cx, cy;
|
2015-04-02 21:59:50 +00:00
|
|
|
mp_get_chroma_location(p->image_params.chroma_location, &cx, &cy);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// By default texture coordinates are such that chroma is centered with
|
|
|
|
// any chroma subsampling. If a specific direction is given, make it
|
|
|
|
// so that the luma and chroma sample line up exactly.
|
|
|
|
// For 4:4:4, setting chroma location should have no effect at all.
|
|
|
|
// luma sample size (in chroma coord. space)
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
chroma.t[0] = ls_w < 1 ? ls_w * -cx / 2 : 0;
|
|
|
|
chroma.t[1] = ls_h < 1 ? ls_h * -cy / 2 : 0;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Make sure luma/chroma sizes are aligned.
|
|
|
|
// Example: For 4:2:0 with size 3x3, the subsampled chroma plane is 2x2
|
|
|
|
// so luma (3,3) has to align with chroma (2,2).
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
chroma.m[0][0] = ls_w * (float)vimg->planes[0].w / vimg->planes[1].w;
|
|
|
|
chroma.m[1][1] = ls_h * (float)vimg->planes[0].h / vimg->planes[1].h;
|
|
|
|
|
|
|
|
// The existing code assumes we just have a single tex multiplier for
|
|
|
|
// all of the planes. This may change in the future
|
|
|
|
float tex_mul = 1.0 / mp_get_csp_mul(p->image_params.colorspace,
|
|
|
|
p->image_desc.component_bits,
|
|
|
|
p->image_desc.component_full_bits);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
memset(tex, 0, 4 * sizeof(tex[0]));
|
2015-03-13 12:42:05 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct texplane *t = &vimg->planes[n];
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
|
|
|
enum plane_type type;
|
|
|
|
if (n >= 3) {
|
|
|
|
type = PLANE_ALPHA;
|
|
|
|
} else if (p->image_desc.flags & MP_IMGFLAG_RGB) {
|
|
|
|
type = PLANE_RGB;
|
|
|
|
} else if (p->image_desc.flags & MP_IMGFLAG_YUV) {
|
|
|
|
type = n == 0 ? PLANE_LUMA : PLANE_CHROMA;
|
|
|
|
} else if (p->image_desc.flags & MP_IMGFLAG_XYZ) {
|
|
|
|
type = PLANE_XYZ;
|
|
|
|
} else {
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
|
|
|
|
tex[n] = (struct img_tex){
|
|
|
|
.type = type,
|
2016-01-26 19:47:32 +00:00
|
|
|
.gl_tex = t->gl_texture,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
.gl_target = t->gl_target,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.multiplier = tex_mul,
|
2016-01-26 19:47:32 +00:00
|
|
|
.use_integer = t->use_integer,
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
.tex_w = t->tex_w,
|
|
|
|
.tex_h = t->tex_h,
|
2015-09-02 10:52:11 +00:00
|
|
|
.w = t->w,
|
|
|
|
.h = t->h,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
.transform = type == PLANE_CHROMA ? chroma : identity_trans,
|
|
|
|
.components = p->image_desc.components[n],
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
};
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
snprintf(tex[n].swizzle, sizeof(tex[n].swizzle), "%s", t->swizzle);
|
2016-04-08 20:21:31 +00:00
|
|
|
get_plane_source_transform(p, t->w, t->h, &tex[n].pre_transform);
|
|
|
|
if (p->image_params.rotate % 180 == 90)
|
|
|
|
MPSWAP(int, tex[n].w, tex[n].h);
|
2013-03-28 19:40:19 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
static void init_video(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (p->hwdec && p->hwdec->driver->imgfmt == p->image_params.imgfmt) {
|
2015-01-29 18:53:49 +00:00
|
|
|
if (p->hwdec->driver->reinit(p->hwdec, &p->image_params) < 0)
|
|
|
|
MP_ERR(p, "Initializing texture for hardware decoding failed.\n");
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
init_image_desc(p, p->image_params.imgfmt);
|
|
|
|
p->hwdec_active = true;
|
|
|
|
} else {
|
|
|
|
init_format(p, p->image_params.imgfmt, false);
|
2015-01-29 18:53:49 +00:00
|
|
|
}
|
2013-12-01 22:39:13 +00:00
|
|
|
|
2016-05-11 15:39:38 +00:00
|
|
|
// Format-dependent checks.
|
|
|
|
check_gl_features(p);
|
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
mp_image_params_guess_csp(&p->image_params);
|
2013-12-01 22:39:13 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
int eq_caps = MP_CSP_EQ_CAPS_GAMMA;
|
2015-12-12 13:47:30 +00:00
|
|
|
if (p->image_params.colorspace != MP_CSP_BT_2020_C)
|
2013-03-01 20:19:20 +00:00
|
|
|
eq_caps |= MP_CSP_EQ_CAPS_COLORMATRIX;
|
2014-03-31 02:51:47 +00:00
|
|
|
if (p->image_desc.flags & MP_IMGFLAG_XYZ)
|
|
|
|
eq_caps |= MP_CSP_EQ_CAPS_BRIGHTNESS;
|
2013-03-01 20:19:20 +00:00
|
|
|
p->video_eq.capabilities = eq_caps;
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
av_lfg_init(&p->lfg, 1);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
debug_check_gl(p, "before video texture creation");
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (!p->hwdec_active) {
|
|
|
|
struct video_image *vimg = &p->image;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
GLenum gl_target =
|
|
|
|
p->opts.use_rectangle ? GL_TEXTURE_RECTANGLE : GL_TEXTURE_2D;
|
2015-09-02 11:08:18 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
struct mp_image layout = {0};
|
|
|
|
mp_image_set_params(&layout, &p->image_params);
|
|
|
|
|
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2013-03-28 19:40:19 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
plane->gl_target = gl_target;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
plane->w = plane->tex_w = mp_image_plane_w(&layout, n);
|
|
|
|
plane->h = plane->tex_h = mp_image_plane_h(&layout, n);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + n);
|
|
|
|
gl->GenTextures(1, &plane->gl_texture);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->BindTexture(gl_target, plane->gl_texture);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->TexImage2D(gl_target, 0, plane->gl_internal_format,
|
2015-09-02 10:52:11 +00:00
|
|
|
plane->w, plane->h, 0,
|
2013-11-03 23:00:18 +00:00
|
|
|
plane->gl_format, plane->gl_type, NULL);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
int filter = plane->use_integer ? GL_NEAREST : GL_LINEAR;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_MIN_FILTER, filter);
|
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_MAG_FILTER, filter);
|
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
gl->TexParameteri(gl_target, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
2013-03-28 19:48:53 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
MP_VERBOSE(p, "Texture for plane %d: %dx%d\n", n, plane->w, plane->h);
|
|
|
|
}
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
debug_check_gl(p, "after video texture creation");
|
|
|
|
|
|
|
|
reinit_rendering(p);
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
static void unref_current_image(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
|
|
|
if (vimg->hwdec_mapped) {
|
|
|
|
assert(p->hwdec_active);
|
|
|
|
if (p->hwdec->driver->unmap)
|
|
|
|
p->hwdec->driver->unmap(p->hwdec);
|
|
|
|
memset(vimg->planes, 0, sizeof(vimg->planes));
|
|
|
|
vimg->hwdec_mapped = false;
|
|
|
|
}
|
|
|
|
mp_image_unrefp(&vimg->mpi);
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
static void uninit_video(struct gl_video *p)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
uninit_rendering(p);
|
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
|
|
|
|
2015-03-13 23:32:20 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
2013-03-28 19:40:19 +00:00
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
gl->DeleteTextures(1, &plane->gl_texture);
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->DeleteBuffers(1, &plane->gl_buffer);
|
|
|
|
}
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
*vimg = (struct video_image){0};
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-01-07 18:00:26 +00:00
|
|
|
// Invalidate image_params to ensure that gl_video_config() will call
|
|
|
|
// init_video() on uninitialized gl_video.
|
2015-01-29 18:53:49 +00:00
|
|
|
p->real_image_params = (struct mp_image_params){0};
|
|
|
|
p->image_params = p->real_image_params;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
p->hwdec_active = false;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void pass_prepare_src_tex(struct gl_video *p)
|
2013-05-25 23:48:39 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct gl_shader_cache *sc = p->sc;
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int n = 0; n < p->pass_tex_num; n++) {
|
|
|
|
struct img_tex *s = &p->pass_tex[n];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (!s->gl_tex)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
char texture_name[32];
|
|
|
|
char texture_size[32];
|
2016-02-25 20:27:55 +00:00
|
|
|
char pixel_size[32];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
snprintf(texture_name, sizeof(texture_name), "texture%d", n);
|
|
|
|
snprintf(texture_size, sizeof(texture_size), "texture_size%d", n);
|
2016-02-25 20:27:55 +00:00
|
|
|
snprintf(pixel_size, sizeof(pixel_size), "pixel_size%d", n);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
if (s->use_integer) {
|
|
|
|
gl_sc_uniform_sampler_ui(sc, texture_name, n);
|
|
|
|
} else {
|
|
|
|
gl_sc_uniform_sampler(sc, texture_name, s->gl_target, n);
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
float f[2] = {1, 1};
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
if (s->gl_target != GL_TEXTURE_RECTANGLE) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
f[0] = s->tex_w;
|
|
|
|
f[1] = s->tex_h;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
gl_sc_uniform_vec2(sc, texture_size, f);
|
2016-02-25 20:27:55 +00:00
|
|
|
gl_sc_uniform_vec2(sc, pixel_size, (GLfloat[]){1.0f / f[0],
|
|
|
|
1.0f / f[1]});
|
2013-05-25 23:48:39 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + n);
|
|
|
|
gl->BindTexture(s->gl_target, s->gl_tex);
|
|
|
|
}
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
|
|
|
}
|
2013-05-25 23:48:39 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void render_pass_quad(struct gl_video *p, int vp_w, int vp_h,
|
2016-03-28 14:30:48 +00:00
|
|
|
const struct mp_rect *dst)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
2016-03-28 14:13:56 +00:00
|
|
|
struct vertex va[4] = {0};
|
2013-05-25 23:48:39 +00:00
|
|
|
|
2015-03-13 20:14:18 +00:00
|
|
|
struct gl_transform t;
|
|
|
|
gl_transform_ortho(&t, 0, vp_w, 0, vp_h);
|
2013-05-25 23:48:39 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
float x[2] = {dst->x0, dst->x1};
|
|
|
|
float y[2] = {dst->y0, dst->y1};
|
2015-03-13 20:14:18 +00:00
|
|
|
gl_transform_vec(t, &x[0], &y[0]);
|
|
|
|
gl_transform_vec(t, &x[1], &y[1]);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
for (int n = 0; n < 4; n++) {
|
|
|
|
struct vertex *v = &va[n];
|
|
|
|
v->position.x = x[n / 2];
|
|
|
|
v->position.y = y[n % 2];
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int i = 0; i < p->pass_tex_num; i++) {
|
|
|
|
struct img_tex *s = &p->pass_tex[i];
|
|
|
|
if (!s->gl_tex)
|
|
|
|
continue;
|
2016-04-08 20:21:31 +00:00
|
|
|
struct gl_transform tr = s->transform;
|
|
|
|
gl_transform_trans(s->pre_transform, &tr);
|
2016-03-28 14:30:48 +00:00
|
|
|
float tx = (n / 2) * s->w;
|
|
|
|
float ty = (n % 2) * s->h;
|
2016-04-08 20:21:31 +00:00
|
|
|
gl_transform_vec(tr, &tx, &ty);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
bool rect = s->gl_target == GL_TEXTURE_RECTANGLE;
|
2016-03-28 14:30:48 +00:00
|
|
|
v->texcoord[i].x = tx / (rect ? 1 : s->tex_w);
|
|
|
|
v->texcoord[i].y = ty / (rect ? 1 : s->tex_h);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-16 19:10:48 +00:00
|
|
|
p->gl->Viewport(0, 0, vp_w, abs(vp_h));
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl_vao_draw_data(&p->vao, GL_TRIANGLE_STRIP, va, 4);
|
|
|
|
|
|
|
|
debug_check_gl(p, "after rendering");
|
2013-05-25 23:48:39 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// flags: see render_pass_quad
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void finish_pass_direct(struct gl_video *p, GLint fbo, int vp_w, int vp_h,
|
2016-03-28 14:30:48 +00:00
|
|
|
const struct mp_rect *dst)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
pass_prepare_src_tex(p);
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
gl_sc_gen_shader_and_reset(p->sc);
|
2016-03-28 14:30:48 +00:00
|
|
|
render_pass_quad(p, vp_w, vp_h, dst);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, 0);
|
|
|
|
memset(&p->pass_tex, 0, sizeof(p->pass_tex));
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
p->pass_tex_num = 0;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// dst_fbo: this will be used for rendering; possibly reallocating the whole
|
|
|
|
// FBO, if the required parameters have changed
|
|
|
|
// w, h: required FBO target dimension, and also defines the target rectangle
|
|
|
|
// used for rasterization
|
|
|
|
// flags: 0 or combination of FBOTEX_FUZZY_W/FBOTEX_FUZZY_H (setting the fuzzy
|
2015-03-13 10:53:54 +00:00
|
|
|
// flags allows the FBO to be larger than the w/h parameters)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void finish_pass_fbo(struct gl_video *p, struct fbotex *dst_fbo,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int w, int h, int flags)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
fbotex_change(dst_fbo, p->gl, p->log, w, h, p->opts.fbo_format, flags);
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_direct(p, dst_fbo->fbo, dst_fbo->rw, dst_fbo->rh,
|
2016-03-28 14:30:48 +00:00
|
|
|
&(struct mp_rect){0, 0, w, h});
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2016-03-05 11:38:51 +00:00
|
|
|
static void skip_unused(struct gl_video *p, int num_components)
|
|
|
|
{
|
|
|
|
for (int i = num_components; i < 4; i++)
|
|
|
|
GLSLF("color.%c = %f;\n", "rgba"[i], i < 3 ? 0.0 : 1.0);
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
static void uninit_scaler(struct gl_video *p, struct scaler *scaler)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
2015-03-13 23:32:20 +00:00
|
|
|
fbotex_uninit(&scaler->sep_fbo);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->DeleteTextures(1, &scaler->gl_lut);
|
|
|
|
scaler->gl_lut = 0;
|
|
|
|
scaler->kernel = NULL;
|
|
|
|
scaler->initialized = false;
|
|
|
|
}
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
static void load_shader(struct gl_video *p, const char *body)
|
|
|
|
{
|
|
|
|
gl_sc_hadd(p->sc, body);
|
|
|
|
gl_sc_uniform_f(p->sc, "random", (double)av_lfg_get(&p->lfg) / UINT32_MAX);
|
|
|
|
gl_sc_uniform_f(p->sc, "frame", p->frames_uploaded);
|
2016-02-22 21:07:04 +00:00
|
|
|
gl_sc_uniform_vec2(p->sc, "image_size", (GLfloat[]){p->image_params.w,
|
|
|
|
p->image_params.h});
|
2015-03-27 12:27:40 +00:00
|
|
|
}
|
|
|
|
|
2016-01-25 19:24:41 +00:00
|
|
|
static const char *get_custom_shader_fn(struct gl_video *p, const char *body)
|
|
|
|
{
|
|
|
|
if (!p->gl->es && strstr(body, "sample") && !strstr(body, "sample_pixel")) {
|
|
|
|
if (!p->custom_shader_fn_warned) {
|
|
|
|
MP_WARN(p, "sample() is deprecated in custom shaders. "
|
|
|
|
"Use sample_pixel()\n");
|
|
|
|
p->custom_shader_fn_warned = true;
|
|
|
|
}
|
|
|
|
return "sample";
|
|
|
|
}
|
|
|
|
return "sample_pixel";
|
|
|
|
}
|
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
// Applies an arbitrary number of shaders in sequence, using the given pair
|
|
|
|
// of FBOs as intermediate buffers. Returns whether any shaders were applied.
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static bool apply_shaders(struct gl_video *p, char **shaders, int w, int h,
|
|
|
|
struct fbotex textures[2])
|
2015-03-27 12:27:40 +00:00
|
|
|
{
|
|
|
|
if (!shaders)
|
|
|
|
return false;
|
|
|
|
bool success = false;
|
|
|
|
int tex = 0;
|
|
|
|
for (int n = 0; shaders[n]; n++) {
|
2015-09-23 20:13:03 +00:00
|
|
|
const char *body = load_cached_file(p, shaders[n]);
|
2015-03-27 12:27:40 +00:00
|
|
|
if (!body)
|
|
|
|
continue;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &textures[tex], w, h, 0);
|
|
|
|
int id = pass_bind(p, img_tex_fbo(&textures[tex], identity_trans,
|
2016-03-05 11:38:51 +00:00
|
|
|
PLANE_RGB, p->components));
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
GLSLHF("#define pixel_size pixel_size%d\n", id);
|
2015-03-27 12:27:40 +00:00
|
|
|
load_shader(p, body);
|
2016-01-25 19:24:41 +00:00
|
|
|
const char *fn_name = get_custom_shader_fn(p, body);
|
2015-03-27 12:27:40 +00:00
|
|
|
GLSLF("// custom shader\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSLF("color = %s(texture%d, texcoord%d, texture_size%d);\n",
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
fn_name, id, id, id);
|
2015-03-27 12:27:40 +00:00
|
|
|
tex = (tex+1) % 2;
|
|
|
|
success = true;
|
|
|
|
}
|
|
|
|
return success;
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
// Semantic equality
|
|
|
|
static bool double_seq(double a, double b)
|
|
|
|
{
|
|
|
|
return (isnan(a) && isnan(b)) || a == b;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool scaler_fun_eq(struct scaler_fun a, struct scaler_fun b)
|
|
|
|
{
|
|
|
|
if ((a.name && !b.name) || (b.name && !a.name))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return ((!a.name && !b.name) || strcmp(a.name, b.name) == 0) &&
|
|
|
|
double_seq(a.params[0], b.params[0]) &&
|
|
|
|
double_seq(a.params[1], b.params[1]) &&
|
|
|
|
a.blur == b.blur;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool scaler_conf_eq(struct scaler_config a, struct scaler_config b)
|
|
|
|
{
|
|
|
|
// Note: antiring isn't compared because it doesn't affect LUT
|
|
|
|
// generation
|
|
|
|
return scaler_fun_eq(a.kernel, b.kernel) &&
|
|
|
|
scaler_fun_eq(a.window, b.window) &&
|
2015-08-20 19:45:58 +00:00
|
|
|
a.radius == b.radius &&
|
|
|
|
a.clamp == b.clamp;
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void reinit_scaler(struct gl_video *p, struct scaler *scaler,
|
|
|
|
const struct scaler_config *conf,
|
|
|
|
double scale_factor,
|
|
|
|
int sizes[])
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (scaler_conf_eq(scaler->conf, *conf) &&
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->scale_factor == scale_factor &&
|
|
|
|
scaler->initialized)
|
2014-04-20 19:30:23 +00:00
|
|
|
return;
|
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
uninit_scaler(p, scaler);
|
2014-04-20 19:30:23 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
scaler->conf = *conf;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->scale_factor = scale_factor;
|
|
|
|
scaler->insufficient = false;
|
|
|
|
scaler->initialized = true;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const struct filter_kernel *t_kernel = mp_find_filter_kernel(conf->kernel.name);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (!t_kernel)
|
|
|
|
return;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->kernel_storage = *t_kernel;
|
|
|
|
scaler->kernel = &scaler->kernel_storage;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const char *win = conf->window.name;
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
if (!win || !win[0])
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
win = t_kernel->window; // fall back to the scaler's default window
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
const struct filter_window *t_window = mp_find_filter_window(win);
|
|
|
|
if (t_window)
|
|
|
|
scaler->kernel->w = *t_window;
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
for (int n = 0; n < 2; n++) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (!isnan(conf->kernel.params[n]))
|
|
|
|
scaler->kernel->f.params[n] = conf->kernel.params[n];
|
|
|
|
if (!isnan(conf->window.params[n]))
|
|
|
|
scaler->kernel->w.params[n] = conf->window.params[n];
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
2014-04-20 19:37:18 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (conf->kernel.blur > 0.0)
|
|
|
|
scaler->kernel->f.blur = conf->kernel.blur;
|
|
|
|
if (conf->window.blur > 0.0)
|
|
|
|
scaler->kernel->w.blur = conf->window.blur;
|
2014-04-20 19:30:23 +00:00
|
|
|
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
if (scaler->kernel->f.resizable && conf->radius > 0.0)
|
|
|
|
scaler->kernel->f.radius = conf->radius;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-08-20 19:45:58 +00:00
|
|
|
scaler->kernel->clamp = conf->clamp;
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
scaler->insufficient = !mp_init_filter(scaler->kernel, sizes, scale_factor);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-11-19 20:20:40 +00:00
|
|
|
if (scaler->kernel->polar && (gl->mpgl_caps & MPGL_CAP_1D_TEX)) {
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
scaler->gl_target = GL_TEXTURE_1D;
|
|
|
|
} else {
|
|
|
|
scaler->gl_target = GL_TEXTURE_2D;
|
|
|
|
}
|
|
|
|
|
|
|
|
int size = scaler->kernel->size;
|
|
|
|
int elems_per_pixel = 4;
|
|
|
|
if (size == 1) {
|
|
|
|
elems_per_pixel = 1;
|
|
|
|
} else if (size == 2) {
|
|
|
|
elems_per_pixel = 2;
|
|
|
|
} else if (size == 6) {
|
|
|
|
elems_per_pixel = 3;
|
|
|
|
}
|
|
|
|
int width = size / elems_per_pixel;
|
|
|
|
assert(size == width * elems_per_pixel);
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_float16_format(gl, elems_per_pixel);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLenum target = scaler->gl_target;
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + TEXUNIT_SCALERS + scaler->index);
|
|
|
|
|
|
|
|
if (!scaler->gl_lut)
|
|
|
|
gl->GenTextures(1, &scaler->gl_lut);
|
|
|
|
|
|
|
|
gl->BindTexture(target, scaler->gl_lut);
|
|
|
|
|
2015-12-05 19:14:23 +00:00
|
|
|
scaler->lut_size = 1 << p->opts.scaler_lut_size;
|
2015-12-05 18:54:25 +00:00
|
|
|
|
|
|
|
float *weights = talloc_array(NULL, float, scaler->lut_size * size);
|
|
|
|
mp_compute_lut(scaler->kernel, scaler->lut_size, weights);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
if (target == GL_TEXTURE_1D) {
|
2015-12-05 18:54:25 +00:00
|
|
|
gl->TexImage1D(target, 0, fmt->internal_format, scaler->lut_size,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
0, fmt->format, GL_FLOAT, weights);
|
|
|
|
} else {
|
2015-12-05 18:54:25 +00:00
|
|
|
gl->TexImage2D(target, 0, fmt->internal_format, width, scaler->lut_size,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
0, fmt->format, GL_FLOAT, weights);
|
|
|
|
}
|
|
|
|
|
|
|
|
talloc_free(weights);
|
|
|
|
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
|
|
|
|
if (target != GL_TEXTURE_1D)
|
|
|
|
gl->TexParameteri(target, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
|
|
|
|
|
|
|
debug_check_gl(p, "after initializing scaler");
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-09-05 12:03:00 +00:00
|
|
|
// Special helper for sampling from two separated stages
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static void pass_sample_separated(struct gl_video *p, struct img_tex src,
|
|
|
|
struct scaler *scaler, int w, int h)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Separate the transformation into x and y components, per pass
|
|
|
|
struct gl_transform t_x = {
|
|
|
|
.m = {{src.transform.m[0][0], 0.0}, {src.transform.m[1][0], 1.0}},
|
|
|
|
.t = {src.transform.t[0], 0.0},
|
|
|
|
};
|
|
|
|
struct gl_transform t_y = {
|
|
|
|
.m = {{1.0, src.transform.m[0][1]}, {0.0, src.transform.m[1][1]}},
|
|
|
|
.t = {0.0, src.transform.t[1]},
|
|
|
|
};
|
|
|
|
|
|
|
|
// First pass (scale only in the y dir)
|
|
|
|
src.transform = t_y;
|
|
|
|
sampler_prelude(p->sc, pass_bind(p, src));
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// pass 1\n");
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_separated_gen(p->sc, scaler, 0, 1);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
GLSLF("color *= %f;\n", src.multiplier);
|
|
|
|
finish_pass_fbo(p, &scaler->sep_fbo, src.w, h, FBOTEX_FUZZY_H);
|
|
|
|
|
|
|
|
// Second pass (scale only in the x dir)
|
|
|
|
src = img_tex_fbo(&scaler->sep_fbo, t_x, src.type, src.components);
|
|
|
|
sampler_prelude(p->sc, pass_bind(p, src));
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// pass 2\n");
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_separated_gen(p->sc, scaler, 1, 0);
|
2015-03-15 05:27:11 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Sample from img_tex, with the src rectangle given by it.
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// The dst rectangle is implicit by what the caller will do next, but w and h
|
|
|
|
// must still be what is going to be used (to dimension FBOs correctly).
|
2016-02-23 15:18:17 +00:00
|
|
|
// This will write the scaled contents to the vec4 "color".
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// The scaler unit is initialized by this function; in order to avoid cache
|
|
|
|
// thrashing, the scaler unit should usually use the same parameters.
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static void pass_sample(struct gl_video *p, struct img_tex tex,
|
|
|
|
struct scaler *scaler, const struct scaler_config *conf,
|
|
|
|
double scale_factor, int w, int h)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
reinit_scaler(p, scaler, conf, scale_factor, filter_sizes);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
bool is_separated = scaler->kernel && !scaler->kernel->polar;
|
|
|
|
|
|
|
|
// Set up the transformation+prelude and bind the texture, for everything
|
|
|
|
// other than separated scaling (which does this in the subfunction)
|
|
|
|
if (!is_separated)
|
|
|
|
sampler_prelude(p->sc, pass_bind(p, tex));
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// Dispatch the scaler. They're all wildly different.
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const char *name = scaler->conf.kernel.name;
|
|
|
|
if (strcmp(name, "bilinear") == 0) {
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = texture(tex, pos);)
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
} else if (strcmp(name, "bicubic_fast") == 0) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_bicubic_fast(p->sc);
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
} else if (strcmp(name, "oversample") == 0) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_oversample(p->sc, scaler, w, h);
|
2015-03-27 12:27:40 +00:00
|
|
|
} else if (strcmp(name, "custom") == 0) {
|
2015-09-23 20:13:03 +00:00
|
|
|
const char *body = load_cached_file(p, p->opts.scale_shader);
|
2015-03-27 12:27:40 +00:00
|
|
|
if (body) {
|
|
|
|
load_shader(p, body);
|
2016-01-25 19:24:41 +00:00
|
|
|
const char *fn_name = get_custom_shader_fn(p, body);
|
2015-03-27 12:27:40 +00:00
|
|
|
GLSLF("// custom scale-shader\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSLF("color = %s(tex, pos, size);\n", fn_name);
|
2015-03-27 12:27:40 +00:00
|
|
|
} else {
|
|
|
|
p->opts.scale_shader = NULL;
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
} else if (scaler->kernel && scaler->kernel->polar) {
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_polar(p->sc, scaler);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
} else if (scaler->kernel) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
pass_sample_separated(p, tex, scaler, w, h);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
} else {
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Should never happen
|
|
|
|
abort();
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Apply any required multipliers. Separated scaling already does this in
|
|
|
|
// its first stage
|
|
|
|
if (!is_separated)
|
|
|
|
GLSLF("color *= %f;\n", tex.multiplier);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Micro-optimization: Avoid scaling unneeded channels
|
2016-03-05 11:38:51 +00:00
|
|
|
skip_unused(p, tex.components);
|
2015-01-29 14:50:21 +00:00
|
|
|
}
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
// Get the number of passes for prescaler, with given display size.
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
static int get_prescale_passes(struct gl_video *p, struct img_tex tex[4])
|
2015-10-26 22:43:48 +00:00
|
|
|
{
|
2016-03-05 11:02:01 +00:00
|
|
|
if (!p->opts.prescale_luma)
|
2015-10-26 22:43:48 +00:00
|
|
|
return 0;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
|
|
|
// Return 0 if no luma planes exist
|
|
|
|
for (int n = 0; ; n++) {
|
|
|
|
if (n > 4)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (tex[n].type == PLANE_LUMA)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
// The downscaling threshold check is turned off.
|
|
|
|
if (p->opts.prescale_downscaling_threshold < 1.0f)
|
|
|
|
return p->opts.prescale_passes;
|
|
|
|
|
|
|
|
double scale_factors[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, true, scale_factors);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
|
|
|
int passes = 0;
|
|
|
|
for (; passes < p->opts.prescale_passes; passes ++) {
|
|
|
|
// The scale factor happens to be the same for superxbr and nnedi3.
|
|
|
|
scale_factors[0] /= 2;
|
|
|
|
scale_factors[1] /= 2;
|
|
|
|
|
|
|
|
if (1.0f / scale_factors[0] > p->opts.prescale_downscaling_threshold)
|
|
|
|
break;
|
|
|
|
if (1.0f / scale_factors[1] > p->opts.prescale_downscaling_threshold)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return passes;
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Upload the NNEDI3 UBO weights only if needed
|
|
|
|
static void upload_nnedi3_weights(struct gl_video *p)
|
2015-10-26 22:43:48 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
GL *gl = p->gl;
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
if (p->opts.nnedi3_opts->upload == NNEDI3_UPLOAD_UBO &&
|
|
|
|
!p->nnedi3_weights_buffer)
|
|
|
|
{
|
|
|
|
gl->GenBuffers(1, &p->nnedi3_weights_buffer);
|
|
|
|
gl->BindBufferBase(GL_UNIFORM_BUFFER, 0, p->nnedi3_weights_buffer);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int size;
|
|
|
|
const float *weights = get_nnedi3_weights(p->opts.nnedi3_opts, &size);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
MP_VERBOSE(p, "Uploading NNEDI3 weights via UBO (size=%d)\n", size);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// We don't know the endianness of GPU, just assume it's LE
|
|
|
|
gl->BufferData(GL_UNIFORM_BUFFER, size, weights, GL_STATIC_DRAW);
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Applies a single pass of the prescaler, and accumulates the offset in
|
|
|
|
// pass_transform.
|
2016-03-05 11:02:01 +00:00
|
|
|
static void pass_prescale_luma(struct gl_video *p, struct img_tex *tex,
|
|
|
|
struct gl_transform *pass_transform,
|
|
|
|
struct fbotex fbo[MAX_PRESCALE_STEPS])
|
2015-10-26 22:43:48 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Happens to be the same for superxbr and nnedi3.
|
|
|
|
const int num_steps = 2;
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
for (int step = 0; step < num_steps; step++) {
|
|
|
|
struct gl_transform step_transform = {{{0}}};
|
|
|
|
int id = pass_bind(p, *tex);
|
2016-03-05 11:38:51 +00:00
|
|
|
int planes = tex->components;
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-03-05 11:02:01 +00:00
|
|
|
switch(p->opts.prescale_luma) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
case 1:
|
2016-03-07 16:34:47 +00:00
|
|
|
assert(planes == 1);
|
|
|
|
pass_superxbr(p->sc, id, step, tex->multiplier,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
p->opts.superxbr_opts, &step_transform);
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
upload_nnedi3_weights(p);
|
2016-03-05 11:38:51 +00:00
|
|
|
pass_nnedi3(p->gl, p->sc, planes, id, step, tex->multiplier,
|
2016-04-05 18:57:02 +00:00
|
|
|
p->opts.nnedi3_opts, &step_transform, tex->gl_target);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
abort();
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int new_w = tex->w * (int)step_transform.m[0][0],
|
|
|
|
new_h = tex->h * (int)step_transform.m[1][1];
|
2015-10-26 22:43:48 +00:00
|
|
|
|
2016-03-05 11:38:51 +00:00
|
|
|
skip_unused(p, planes);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &fbo[step], new_w, new_h, 0);
|
|
|
|
*tex = img_tex_fbo(&fbo[step], identity_trans, tex->type, tex->components);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Accumulate the local transform
|
|
|
|
gl_transform_trans(step_transform, pass_transform);
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Copy a texture to the vec4 color, while increasing offset. Also applies
|
|
|
|
// the texture multiplier to the sampled color
|
|
|
|
static void copy_img_tex(struct gl_video *p, int *offset, struct img_tex img)
|
|
|
|
{
|
|
|
|
int count = img.components;
|
|
|
|
assert(*offset + count <= 4);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int id = pass_bind(p, img);
|
2016-03-17 11:44:53 +00:00
|
|
|
char src[5] = {0};
|
|
|
|
char dst[5] = {0};
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
const char *tex_fmt = img.swizzle[0] ? img.swizzle : "rgba";
|
2016-03-17 11:44:53 +00:00
|
|
|
const char *dst_fmt = "rgba";
|
|
|
|
for (int i = 0; i < count; i++) {
|
|
|
|
src[i] = tex_fmt[i];
|
|
|
|
dst[i] = dst_fmt[*offset + i];
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
if (img.use_integer) {
|
|
|
|
uint64_t tex_max = 1ull << p->image_desc.component_full_bits;
|
|
|
|
img.multiplier *= 1.0 / (tex_max - 1);
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
GLSLF("color.%s = %f * vec4(texture(texture%d, texcoord%d)).%s;\n",
|
2016-03-17 11:44:53 +00:00
|
|
|
dst, img.multiplier, id, id, src);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
*offset += count;
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// sample from video textures, set "color" variable to yuv value
|
|
|
|
static void pass_read_video(struct gl_video *p)
|
2016-01-26 19:47:32 +00:00
|
|
|
{
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex tex[4];
|
|
|
|
pass_get_img_tex(p, &p->image, tex);
|
|
|
|
|
|
|
|
// Most of the steps here don't actually apply image transformations yet,
|
|
|
|
// save for the actual upscaling - so as a code convenience we store them
|
|
|
|
// separately
|
|
|
|
struct gl_transform transforms[4];
|
|
|
|
struct gl_transform tex_trans = identity_trans;
|
|
|
|
for (int i = 0; i < 4; i++) {
|
|
|
|
transforms[i] = tex[i].transform;
|
|
|
|
tex[i].transform = identity_trans;
|
|
|
|
}
|
2016-01-26 19:47:32 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int prescale_passes = get_prescale_passes(p, tex);
|
2016-01-26 19:47:32 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int dst_w = p->texture_w << prescale_passes,
|
|
|
|
dst_h = p->texture_h << prescale_passes;
|
2016-01-27 20:08:30 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
bool needs_deband[4];
|
|
|
|
int scaler_id[4]; // ID if needed, -1 otherwise
|
|
|
|
int needs_prescale[4]; // number of prescaling passes left
|
|
|
|
|
|
|
|
// Determine what needs to be done for which plane
|
|
|
|
for (int i=0; i < 4; i++) {
|
|
|
|
enum plane_type type = tex[i].type;
|
|
|
|
if (type == PLANE_NONE) {
|
|
|
|
needs_deband[i] = false;
|
|
|
|
needs_prescale[i] = 0;
|
|
|
|
scaler_id[i] = -1;
|
2016-01-26 19:47:32 +00:00
|
|
|
continue;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
needs_deband[i] = type != PLANE_ALPHA ? p->opts.deband : false;
|
|
|
|
needs_prescale[i] = type == PLANE_LUMA ? prescale_passes : 0;
|
|
|
|
|
|
|
|
scaler_id[i] = -1;
|
|
|
|
switch (type) {
|
|
|
|
case PLANE_RGB:
|
|
|
|
case PLANE_LUMA:
|
|
|
|
case PLANE_XYZ:
|
2016-03-05 08:42:57 +00:00
|
|
|
scaler_id[i] = SCALER_SCALE;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case PLANE_CHROMA:
|
2016-03-05 08:42:57 +00:00
|
|
|
scaler_id[i] = SCALER_CSCALE;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case PLANE_ALPHA: // always use bilinear for alpha
|
|
|
|
default:
|
2016-01-27 20:08:30 +00:00
|
|
|
continue;
|
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
|
|
|
// We can skip scaling if the texture is already at the required size
|
|
|
|
if (tex[i].w == dst_w && tex[i].h == dst_h)
|
|
|
|
scaler_id[i] = -1;
|
2016-01-26 19:47:32 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Process all the planes that need some action performed
|
|
|
|
while (true) {
|
|
|
|
// Find next plane to operate on
|
|
|
|
int n = -1;
|
|
|
|
for (int i = 0; i < 4; i++) {
|
|
|
|
if (tex[i].type != PLANE_NONE &&
|
|
|
|
(scaler_id[i] >= 0 || needs_deband[i] || needs_prescale[i]))
|
|
|
|
{
|
|
|
|
n = i;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2016-01-26 19:47:32 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
if (n == -1) // no textures left
|
|
|
|
break;
|
|
|
|
|
|
|
|
// Figure out if it needs to be merged with anything else first
|
|
|
|
int o = -1;
|
|
|
|
for (int i = n+1; i < 4; i++) {
|
|
|
|
if (tex[i].type == tex[n].type
|
|
|
|
&& tex[i].w == tex[n].w
|
|
|
|
&& tex[i].h == tex[n].h
|
|
|
|
&& gl_transform_eq(transforms[i], transforms[n]))
|
|
|
|
{
|
|
|
|
o = i;
|
|
|
|
break;
|
|
|
|
}
|
2015-09-05 15:39:27 +00:00
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Multiple planes share the same dimensions and type, merge them for
|
|
|
|
// upscaling/debanding efficiency
|
|
|
|
if (o != -1) {
|
|
|
|
GLSLF("// merging plane %d into %d\n", o, n);
|
|
|
|
|
|
|
|
int num = 0;
|
|
|
|
copy_img_tex(p, &num, tex[n]);
|
|
|
|
copy_img_tex(p, &num, tex[o]);
|
|
|
|
finish_pass_fbo(p, &p->merge_fbo[n], tex[n].w, tex[n].h, 0);
|
|
|
|
tex[n] = img_tex_fbo(&p->merge_fbo[n], identity_trans,
|
|
|
|
tex[n].type, num);
|
|
|
|
|
|
|
|
memset(&tex[o], 0, sizeof(tex[o]));
|
|
|
|
continue;
|
2015-09-05 15:39:27 +00:00
|
|
|
}
|
2015-04-12 01:34:38 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// The steps after this point (debanding, upscaling) can't handle
|
|
|
|
// integer textures, so the plane is still in that format by this point
|
|
|
|
// we need to ensure it gets converted
|
|
|
|
if (tex[n].use_integer) {
|
|
|
|
GLSLF("// use_integer fix for plane %d\n", n);
|
|
|
|
|
|
|
|
copy_img_tex(p, &(int){0}, tex[n]);
|
|
|
|
finish_pass_fbo(p, &p->integer_fbo[n], tex[n].w, tex[n].h, 0);
|
|
|
|
tex[n] = img_tex_fbo(&p->integer_fbo[n], identity_trans,
|
|
|
|
tex[n].type, tex[n].components);
|
|
|
|
continue;
|
2015-03-27 12:27:40 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Plane is not yet debanded
|
|
|
|
if (needs_deband[n]) {
|
|
|
|
GLSLF("// debanding plane %d\n", n);
|
2015-03-27 12:27:40 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int id = pass_bind(p, tex[n]);
|
|
|
|
pass_sample_deband(p->sc, p->opts.deband_opts, id, tex[n].multiplier,
|
2016-03-27 14:46:01 +00:00
|
|
|
tex[n].gl_target, &p->lfg);
|
2016-03-05 11:38:51 +00:00
|
|
|
skip_unused(p, tex[n].components);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &p->deband_fbo[n], tex[n].w, tex[n].h, 0);
|
|
|
|
tex[n] = img_tex_fbo(&p->deband_fbo[n], identity_trans,
|
|
|
|
tex[n].type, tex[n].components);
|
|
|
|
|
|
|
|
needs_deband[n] = false;
|
|
|
|
continue;
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Plane still needs prescaling passes
|
|
|
|
if (needs_prescale[n]) {
|
|
|
|
GLSLF("// prescaling plane %d (%d left)\n", n, needs_prescale[n]);
|
2016-03-05 11:02:01 +00:00
|
|
|
pass_prescale_luma(p, &tex[n], &tex_trans,
|
|
|
|
p->prescale_fbo[needs_prescale[n]-1]);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
needs_prescale[n]--;
|
|
|
|
|
|
|
|
// We can skip scaling if we arrived at our target res
|
|
|
|
if (tex[n].w == dst_w && tex[n].h == dst_h)
|
|
|
|
scaler_id[n] = -1;
|
|
|
|
|
|
|
|
// If we're done prescaling, we need to adjust all of the
|
|
|
|
// other transforms to make sure the planes still align
|
|
|
|
if (needs_prescale[n] == 0) {
|
|
|
|
for (int i = 0; i < 4; i++) {
|
|
|
|
if (n == i)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
transforms[i].t[0] -= tex_trans.t[0] / tex_trans.m[0][0];
|
|
|
|
transforms[i].t[1] -= tex_trans.t[1] / tex_trans.m[1][1];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Plane is not yet upscaled
|
|
|
|
if (scaler_id[n] >= 0) {
|
|
|
|
const struct scaler_config *conf = &p->opts.scaler[scaler_id[n]];
|
|
|
|
struct scaler *scaler = &p->scaler[scaler_id[n]];
|
|
|
|
|
|
|
|
// This is the only step that actually uses the transform
|
|
|
|
tex[n].transform = transforms[n];
|
|
|
|
|
|
|
|
// Bilinear scaling is a no-op due to GPU sampling
|
|
|
|
if (strcmp(conf->kernel.name, "bilinear") != 0) {
|
|
|
|
GLSLF("// upscaling plane %d\n", n);
|
|
|
|
pass_sample(p, tex[n], scaler, conf, 1.0, dst_w, dst_h);
|
|
|
|
finish_pass_fbo(p, &p->scale_fbo[n], dst_w, dst_h, FBOTEX_FUZZY);
|
|
|
|
tex[n] = img_tex_fbo(&p->scale_fbo[n], identity_trans,
|
|
|
|
tex[n].type, tex[n].components);
|
|
|
|
transforms[n] = identity_trans;
|
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
scaler_id[n] = -1;
|
|
|
|
continue;
|
2015-10-26 22:43:48 +00:00
|
|
|
}
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
|
|
|
|
// Execution should never reach this point
|
|
|
|
abort();
|
2015-03-27 12:27:40 +00:00
|
|
|
}
|
2015-10-26 22:43:48 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// All planes are of the same size and properly aligned at this point
|
|
|
|
GLSLF("// combining planes\n");
|
|
|
|
int coord = 0;
|
|
|
|
for (int i = 0; i < 4; i++) {
|
|
|
|
if (tex[i].type != PLANE_NONE)
|
|
|
|
copy_img_tex(p, &coord, tex[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
p->texture_w = dst_w;
|
|
|
|
p->texture_h = dst_h;
|
|
|
|
p->texture_offset = tex_trans;
|
2016-03-05 11:38:51 +00:00
|
|
|
p->components = coord;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Utility function that simply binds an FBO and reads from it, without any
|
|
|
|
// transformations. Returns the ID of the texture unit it was bound to
|
|
|
|
static int pass_read_fbo(struct gl_video *p, struct fbotex *fbo)
|
|
|
|
{
|
2016-03-05 11:38:51 +00:00
|
|
|
struct img_tex tex = img_tex_fbo(fbo, identity_trans, PLANE_RGB, p->components);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
copy_img_tex(p, &(int){0}, tex);
|
|
|
|
|
|
|
|
return pass_bind(p, tex);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// yuv conversion, and any other conversions before main up/down-scaling
|
|
|
|
static void pass_convert_yuv(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct gl_shader_cache *sc = p->sc;
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
struct mp_csp_params cparams = MP_CSP_PARAMS_DEFAULTS;
|
|
|
|
cparams.gray = p->is_yuv && !p->is_packed_yuv && p->plane_count == 1;
|
|
|
|
cparams.input_bits = p->image_desc.component_bits;
|
2015-12-07 22:41:29 +00:00
|
|
|
cparams.texture_bits = p->image_desc.component_full_bits;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
mp_csp_set_image_params(&cparams, &p->image_params);
|
|
|
|
mp_csp_copy_equalizer_values(&cparams, &p->video_eq);
|
2015-03-23 01:42:19 +00:00
|
|
|
p->user_gamma = 1.0 / (cparams.gamma * p->opts.gamma);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// color conversion\n");
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (p->color_swizzle[0])
|
|
|
|
GLSLF("color = color.%s;\n", p->color_swizzle);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Pre-colormatrix input gamma correction
|
2015-12-09 16:10:38 +00:00
|
|
|
if (cparams.colorspace == MP_CSP_XYZ)
|
|
|
|
GLSL(color.rgb = pow(color.rgb, vec3(2.6));) // linear light
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// We always explicitly normalize the range in pass_read_video
|
|
|
|
cparams.input_bits = cparams.texture_bits = 0;
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2015-12-07 22:45:41 +00:00
|
|
|
// Conversion to RGB. For RGB itself, this still applies e.g. brightness
|
|
|
|
// and contrast controls, or expansion of e.g. LSB-packed 10 bit data.
|
|
|
|
struct mp_cmat m = {{{0}}};
|
2015-12-08 23:22:12 +00:00
|
|
|
mp_get_csp_matrix(&cparams, &m);
|
2015-12-07 22:45:41 +00:00
|
|
|
gl_sc_uniform_mat3(sc, "colormatrix", true, &m.m[0][0]);
|
|
|
|
gl_sc_uniform_vec3(sc, "colormatrix_c", m.c);
|
|
|
|
|
|
|
|
GLSL(color.rgb = mat3(colormatrix) * color.rgb + colormatrix_c;)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
|
|
|
if (p->image_params.colorspace == MP_CSP_BT_2020_C) {
|
|
|
|
// Conversion for C'rcY'cC'bc via the BT.2020 CL system:
|
|
|
|
// C'bc = (B'-Y'c) / 1.9404 | C'bc <= 0
|
|
|
|
// = (B'-Y'c) / 1.5816 | C'bc > 0
|
|
|
|
//
|
|
|
|
// C'rc = (R'-Y'c) / 1.7184 | C'rc <= 0
|
|
|
|
// = (R'-Y'c) / 0.9936 | C'rc > 0
|
|
|
|
//
|
|
|
|
// as per the BT.2020 specification, table 4. This is a non-linear
|
|
|
|
// transformation because (constant) luminance receives non-equal
|
|
|
|
// contributions from the three different channels.
|
|
|
|
GLSLF("// constant luminance conversion\n");
|
|
|
|
GLSL(color.br = color.br * mix(vec2(1.5816, 0.9936),
|
|
|
|
vec2(1.9404, 1.7184),
|
|
|
|
lessThanEqual(color.br, vec2(0)))
|
|
|
|
+ color.gg;)
|
|
|
|
// Expand channels to camera-linear light. This shader currently just
|
|
|
|
// assumes everything uses the BT.2020 12-bit gamma function, since the
|
|
|
|
// difference between 10 and 12-bit is negligible for anything other
|
|
|
|
// than 12-bit content.
|
|
|
|
GLSL(color.rgb = mix(color.rgb / vec3(4.5),
|
|
|
|
pow((color.rgb + vec3(0.0993))/vec3(1.0993), vec3(1.0/0.45)),
|
|
|
|
lessThanEqual(vec3(0.08145), color.rgb));)
|
|
|
|
// Calculate the green channel from the expanded RYcB
|
|
|
|
// The BT.2020 specification says Yc = 0.2627*R + 0.6780*G + 0.0593*B
|
|
|
|
GLSL(color.g = (color.g - 0.2627*color.r - 0.0593*color.b)/0.6780;)
|
2015-03-14 02:04:23 +00:00
|
|
|
// Recompress to receive the R'G'B' result, same as other systems
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSL(color.rgb = mix(color.rgb * vec3(4.5),
|
|
|
|
vec3(1.0993) * pow(color.rgb, vec3(0.45)) - vec3(0.0993),
|
|
|
|
lessThanEqual(vec3(0.0181), color.rgb));)
|
|
|
|
}
|
|
|
|
|
2016-03-05 11:38:51 +00:00
|
|
|
p->components = 3;
|
2015-03-13 20:29:04 +00:00
|
|
|
if (!p->has_alpha || p->opts.alpha_mode == 0) { // none
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSL(color.a = 1.0;)
|
2015-12-22 22:14:47 +00:00
|
|
|
} else if (p->opts.alpha_mode == 2) { // blend against black
|
2015-03-13 20:29:04 +00:00
|
|
|
GLSL(color = vec4(color.rgb * color.a, 1.0);)
|
2016-03-05 11:38:51 +00:00
|
|
|
} else { // alpha present in image
|
|
|
|
p->components = 4;
|
2016-03-29 19:56:38 +00:00
|
|
|
GLSL(color = vec4(color.rgb * color.a, color.a);)
|
2015-03-13 20:29:04 +00:00
|
|
|
}
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
}
|
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
static void get_scale_factors(struct gl_video *p, bool transpose_rot, double xy[2])
|
2014-11-23 19:06:05 +00:00
|
|
|
{
|
2016-03-28 14:30:48 +00:00
|
|
|
double target_w = p->src_rect.x1 - p->src_rect.x0;
|
|
|
|
double target_h = p->src_rect.y1 - p->src_rect.y0;
|
2016-04-08 20:21:31 +00:00
|
|
|
if (transpose_rot && p->image_params.rotate % 180 == 90)
|
2016-03-28 14:30:48 +00:00
|
|
|
MPSWAP(double, target_w, target_h);
|
|
|
|
xy[0] = (p->dst_rect.x1 - p->dst_rect.x0) / target_w;
|
|
|
|
xy[1] = (p->dst_rect.y1 - p->dst_rect.y0) / target_h;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
// Cropping.
|
2016-03-28 14:30:48 +00:00
|
|
|
static void compute_src_transform(struct gl_video *p, struct gl_transform *tr)
|
2015-09-07 19:02:49 +00:00
|
|
|
{
|
2015-10-23 17:52:03 +00:00
|
|
|
float sx = (p->src_rect.x1 - p->src_rect.x0) / (float)p->texture_w,
|
|
|
|
sy = (p->src_rect.y1 - p->src_rect.y0) / (float)p->texture_h,
|
2015-09-07 19:02:49 +00:00
|
|
|
ox = p->src_rect.x0,
|
|
|
|
oy = p->src_rect.y0;
|
2016-03-28 14:30:48 +00:00
|
|
|
struct gl_transform transform = {{{sx, 0}, {0, sy}}, {ox, oy}};
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
gl_transform_trans(p->texture_offset, &transform);
|
|
|
|
|
2015-09-07 19:02:49 +00:00
|
|
|
*tr = transform;
|
|
|
|
}
|
|
|
|
|
2015-03-15 21:52:34 +00:00
|
|
|
// Takes care of the main scaling and pre/post-conversions
|
|
|
|
static void pass_scale_main(struct gl_video *p)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
|
|
|
// Figure out the main scaler.
|
|
|
|
double xy[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, true, xy);
|
2015-10-26 22:43:48 +00:00
|
|
|
|
|
|
|
// actual scale factor should be divided by the scale factor of prescaling.
|
|
|
|
xy[0] /= p->texture_offset.m[0][0];
|
|
|
|
xy[1] /= p->texture_offset.m[1][1];
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
bool downscaling = xy[0] < 1.0 || xy[1] < 1.0;
|
|
|
|
bool upscaling = !downscaling && (xy[0] > 1.0 || xy[1] > 1.0);
|
|
|
|
double scale_factor = 1.0;
|
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
struct scaler *scaler = &p->scaler[SCALER_SCALE];
|
|
|
|
struct scaler_config scaler_conf = p->opts.scaler[SCALER_SCALE];
|
2015-10-26 22:43:48 +00:00
|
|
|
if (p->opts.scaler_resizes_only && !downscaling && !upscaling) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
scaler_conf.kernel.name = "bilinear";
|
2015-10-26 22:43:48 +00:00
|
|
|
// bilinear is going to be used, just remove all sub-pixel offsets.
|
|
|
|
p->texture_offset.t[0] = (int)p->texture_offset.t[0];
|
|
|
|
p->texture_offset.t[1] = (int)p->texture_offset.t[1];
|
|
|
|
}
|
2016-03-05 08:42:57 +00:00
|
|
|
if (downscaling && p->opts.scaler[SCALER_DSCALE].kernel.name) {
|
|
|
|
scaler_conf = p->opts.scaler[SCALER_DSCALE];
|
|
|
|
scaler = &p->scaler[SCALER_DSCALE];
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
2015-11-07 16:49:14 +00:00
|
|
|
// When requesting correct-downscaling and the clip is anamorphic, and
|
|
|
|
// because only a single scale factor is used for both axes, enable it only
|
2015-08-10 00:57:53 +00:00
|
|
|
// when both axes are downscaled, and use the milder of the factors to not
|
|
|
|
// end up with too much blur on one axis (even if we end up with sub-optimal
|
2015-11-07 16:49:14 +00:00
|
|
|
// scale factor on the other axis). This is better than not respecting
|
|
|
|
// correct scaling at all for anamorphic clips.
|
2015-08-10 00:57:53 +00:00
|
|
|
double f = MPMAX(xy[0], xy[1]);
|
2015-11-07 16:49:14 +00:00
|
|
|
if (p->opts.correct_downscaling && f < 1.0)
|
2015-08-10 00:57:53 +00:00
|
|
|
scale_factor = 1.0 / f;
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Pre-conversion, like linear light/sigmoidization
|
|
|
|
GLSLF("// scaler pre-conversion\n");
|
2015-09-05 09:39:20 +00:00
|
|
|
if (p->use_linear)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_linearize(p->sc, p->image_params.gamma);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-03-15 21:52:34 +00:00
|
|
|
bool use_sigmoid = p->use_linear && p->opts.sigmoid_upscaling && upscaling;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
float sig_center, sig_slope, sig_offset, sig_scale;
|
|
|
|
if (use_sigmoid) {
|
|
|
|
// Coefficients for the sigmoidal transform are taken from the
|
|
|
|
// formula here: http://www.imagemagick.org/Usage/color_mods/#sigmoidal
|
|
|
|
sig_center = p->opts.sigmoid_center;
|
|
|
|
sig_slope = p->opts.sigmoid_slope;
|
|
|
|
// This function needs to go through (0,0) and (1,1) so we compute the
|
|
|
|
// values at 1 and 0, and then scale/shift them, respectively.
|
|
|
|
sig_offset = 1.0/(1+expf(sig_slope * sig_center));
|
|
|
|
sig_scale = 1.0/(1+expf(sig_slope * (sig_center-1))) - sig_offset;
|
|
|
|
GLSLF("color.rgb = %f - log(1.0/(color.rgb * %f + %f) - 1.0)/%f;\n",
|
|
|
|
sig_center, sig_scale, sig_offset, sig_slope);
|
|
|
|
}
|
|
|
|
|
2016-03-28 14:30:48 +00:00
|
|
|
int vp_w = p->dst_rect.x1 - p->dst_rect.x0;
|
|
|
|
int vp_h = p->dst_rect.y1 - p->dst_rect.y0;
|
2015-09-07 19:02:49 +00:00
|
|
|
struct gl_transform transform;
|
2016-03-28 14:30:48 +00:00
|
|
|
compute_src_transform(p, &transform);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// main scaling\n");
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &p->indirect_fbo, p->texture_w, p->texture_h, 0);
|
2016-03-05 11:38:51 +00:00
|
|
|
struct img_tex src = img_tex_fbo(&p->indirect_fbo, transform,
|
|
|
|
PLANE_RGB, p->components);
|
|
|
|
pass_sample(p, src, scaler, &scaler_conf, scale_factor, vp_w, vp_h);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-10-23 17:52:03 +00:00
|
|
|
// Changes the texture size to display size after main scaler.
|
|
|
|
p->texture_w = vp_w;
|
|
|
|
p->texture_h = vp_h;
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSLF("// scaler post-conversion\n");
|
|
|
|
if (use_sigmoid) {
|
|
|
|
// Inverse of the transformation above
|
|
|
|
GLSLF("color.rgb = (1.0/(1.0 + exp(%f * (%f - color.rgb))) - %f) / %f;\n",
|
|
|
|
sig_slope, sig_center, sig_offset, sig_scale);
|
|
|
|
}
|
2015-03-15 21:52:34 +00:00
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-03-27 10:12:46 +00:00
|
|
|
// Adapts the colors from the given color space to the display device's native
|
|
|
|
// gamut.
|
|
|
|
static void pass_colormanage(struct gl_video *p, enum mp_csp_prim prim_src,
|
|
|
|
enum mp_csp_trc trc_src)
|
2015-03-15 21:52:34 +00:00
|
|
|
{
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
GLSLF("// color management\n");
|
|
|
|
enum mp_csp_trc trc_dst = p->opts.target_trc;
|
2015-03-27 10:12:46 +00:00
|
|
|
enum mp_csp_prim prim_dst = p->opts.target_prim;
|
2015-03-23 01:42:19 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
if (p->use_lut_3d) {
|
2016-02-13 14:33:00 +00:00
|
|
|
// The 3DLUT is always generated against the original source space
|
|
|
|
enum mp_csp_prim prim_orig = p->image_params.primaries;
|
|
|
|
enum mp_csp_trc trc_orig = p->image_params.gamma;
|
|
|
|
|
|
|
|
if (gl_video_get_lut3d(p, prim_orig, trc_orig)) {
|
|
|
|
prim_dst = prim_orig;
|
|
|
|
trc_dst = trc_orig;
|
|
|
|
} else {
|
|
|
|
p->use_lut_3d = false;
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (prim_dst == MP_CSP_PRIM_AUTO)
|
|
|
|
prim_dst = prim_src;
|
|
|
|
if (trc_dst == MP_CSP_TRC_AUTO) {
|
2015-03-27 10:12:46 +00:00
|
|
|
trc_dst = trc_src;
|
|
|
|
// Avoid outputting linear light at all costs
|
|
|
|
if (trc_dst == MP_CSP_TRC_LINEAR)
|
|
|
|
trc_dst = p->image_params.gamma;
|
|
|
|
if (trc_dst == MP_CSP_TRC_LINEAR)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
trc_dst = MP_CSP_TRC_GAMMA22;
|
|
|
|
}
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
bool need_gamma = trc_src != trc_dst || prim_src != prim_dst;
|
2015-03-29 04:34:34 +00:00
|
|
|
if (need_gamma)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_linearize(p->sc, trc_src);
|
2016-02-13 14:33:00 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Adapt to the right colorspace if necessary
|
|
|
|
if (prim_src != prim_dst) {
|
|
|
|
struct mp_csp_primaries csp_src = mp_get_csp_primaries(prim_src),
|
|
|
|
csp_dst = mp_get_csp_primaries(prim_dst);
|
|
|
|
float m[3][3] = {{0}};
|
|
|
|
mp_get_cms_matrix(csp_src, csp_dst, MP_INTENT_RELATIVE_COLORIMETRIC, m);
|
|
|
|
gl_sc_uniform_mat3(p->sc, "cms_matrix", true, &m[0][0]);
|
|
|
|
GLSL(color.rgb = cms_matrix * color.rgb;)
|
|
|
|
}
|
2016-02-13 14:33:00 +00:00
|
|
|
|
|
|
|
if (need_gamma)
|
|
|
|
pass_delinearize(p->sc, trc_dst);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
if (p->use_lut_3d) {
|
|
|
|
gl_sc_uniform_sampler(p->sc, "lut_3d", GL_TEXTURE_3D, TEXUNIT_3DLUT);
|
|
|
|
GLSL(color.rgb = texture3D(lut_3d, color.rgb).rgb;)
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
2014-11-23 19:06:05 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
static void pass_dither(struct gl_video *p)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
// Assume 8 bits per component if unknown.
|
2015-12-19 10:56:19 +00:00
|
|
|
int dst_depth = gl->fb_g ? gl->fb_g : 8;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (p->opts.dither_depth > 0)
|
|
|
|
dst_depth = p->opts.dither_depth;
|
|
|
|
|
|
|
|
if (p->opts.dither_depth < 0 || p->opts.dither_algo < 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!p->dither_texture) {
|
|
|
|
MP_VERBOSE(p, "Dither to %d.\n", dst_depth);
|
|
|
|
|
|
|
|
int tex_size;
|
|
|
|
void *tex_data;
|
2016-05-12 18:08:49 +00:00
|
|
|
GLint tex_iformat = 0;
|
|
|
|
GLint tex_format = 0;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLenum tex_type;
|
|
|
|
unsigned char temp[256];
|
|
|
|
|
|
|
|
if (p->opts.dither_algo == 0) {
|
|
|
|
int sizeb = p->opts.dither_size;
|
|
|
|
int size = 1 << sizeb;
|
|
|
|
|
|
|
|
if (p->last_dither_matrix_size != size) {
|
|
|
|
p->last_dither_matrix = talloc_realloc(p, p->last_dither_matrix,
|
|
|
|
float, size * size);
|
|
|
|
mp_make_fruit_dither_matrix(p->last_dither_matrix, sizeb);
|
|
|
|
p->last_dither_matrix_size = size;
|
|
|
|
}
|
|
|
|
|
2015-12-08 22:22:08 +00:00
|
|
|
// Prefer R16 texture since they provide higher precision.
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_unorm_format(gl, 2, 1);
|
|
|
|
if (!fmt || gl->es)
|
|
|
|
fmt = gl_find_float16_format(gl, 1);
|
|
|
|
tex_size = size;
|
|
|
|
if (fmt) {
|
2015-12-08 22:22:08 +00:00
|
|
|
tex_iformat = fmt->internal_format;
|
|
|
|
tex_format = fmt->format;
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
tex_type = GL_FLOAT;
|
|
|
|
tex_data = p->last_dither_matrix;
|
|
|
|
} else {
|
|
|
|
assert(sizeof(temp) >= 8 * 8);
|
|
|
|
mp_make_ordered_dither_matrix(temp, 8);
|
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_unorm_format(gl, 1, 1);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
tex_size = 8;
|
|
|
|
tex_iformat = fmt->internal_format;
|
|
|
|
tex_format = fmt->format;
|
|
|
|
tex_type = fmt->type;
|
|
|
|
tex_data = temp;
|
|
|
|
}
|
|
|
|
|
|
|
|
p->dither_size = tex_size;
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + TEXUNIT_DITHER);
|
|
|
|
gl->GenTextures(1, &p->dither_texture);
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, p->dither_texture);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 1);
|
|
|
|
gl->TexImage2D(GL_TEXTURE_2D, 0, tex_iformat, tex_size, tex_size, 0,
|
2016-05-12 18:08:49 +00:00
|
|
|
tex_format, tex_type, tex_data);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
|
|
|
|
gl->TexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 4);
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2015-02-19 13:03:18 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
debug_check_gl(p, "dither setup");
|
vo_opengl: greatly increase smoothmotion performance
Instead of rendering and upscaling each video frame on every vsync, this
version of the algorithm only draws them once and caches the result,
so the only operation that has to run on every vsync is a cheap linear
interpolation, plus CMS/dithering.
On my machine, this is a huge speedup for 24 Hz content (on a 60 Hz
monitor), up to 120% faster. (The speedup is not quite 250% because of
the overhead that the larger FBOs and CMS provides)
In terms of the implementation, this commit basically swaps
interpolation and upscaling - upscaling is moved to inter_program, and
interpolation is moved to the final_program.
Furthermore, the main bulk of the frame rendering logic (upscaling etc.)
was moved to a separete function, which is called from
gl_video_interpolate_frame only if it's actually necessarily, and
skipped otherwise.
2015-02-20 21:12:02 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSLF("// dithering\n");
|
|
|
|
|
|
|
|
// This defines how many bits are considered significant for output on
|
|
|
|
// screen. The superfluous bits will be used for rounding according to the
|
|
|
|
// dither matrix. The precision of the source implicitly decides how many
|
|
|
|
// dither patterns can be visible.
|
2015-03-16 19:22:09 +00:00
|
|
|
int dither_quantization = (1 << dst_depth) - 1;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
gl_sc_uniform_sampler(p->sc, "dither", GL_TEXTURE_2D, TEXUNIT_DITHER);
|
|
|
|
|
2015-11-19 13:41:49 +00:00
|
|
|
GLSLF("vec2 dither_pos = gl_FragCoord.xy / %d.0;\n", p->dither_size);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
if (p->opts.temporal_dither) {
|
2015-07-20 17:09:22 +00:00
|
|
|
int phase = (p->frames_rendered / p->opts.temporal_dither_period) % 8u;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
float r = phase * (M_PI / 2); // rotate
|
|
|
|
float m = phase < 4 ? 1 : -1; // mirror
|
|
|
|
|
|
|
|
float matrix[2][2] = {{cos(r), -sin(r) },
|
|
|
|
{sin(r) * m, cos(r) * m}};
|
|
|
|
gl_sc_uniform_mat2(p->sc, "dither_trafo", true, &matrix[0][0]);
|
|
|
|
|
|
|
|
GLSL(dither_pos = dither_trafo * dither_pos;)
|
2015-02-19 13:03:18 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
GLSL(float dither_value = texture(dither, dither_pos).r;)
|
2015-11-19 13:41:49 +00:00
|
|
|
GLSLF("color = floor(color * %d.0 + dither_value + 0.5 / %d.0) / %d.0;\n",
|
|
|
|
dither_quantization, p->dither_size * p->dither_size,
|
2015-03-16 19:22:09 +00:00
|
|
|
dither_quantization);
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
}
|
|
|
|
|
2015-03-29 04:34:34 +00:00
|
|
|
// Draws the OSD, in scene-referred colors.. If cms is true, subtitles are
|
|
|
|
// instead adapted to the display's gamut.
|
2015-03-23 01:42:19 +00:00
|
|
|
static void pass_draw_osd(struct gl_video *p, int draw_flags, double pts,
|
|
|
|
struct mp_osd_res rect, int vp_w, int vp_h, int fbo,
|
2015-03-29 04:34:34 +00:00
|
|
|
bool cms)
|
2015-03-23 01:42:19 +00:00
|
|
|
{
|
|
|
|
mpgl_osd_generate(p->osd, rect, pts, p->image_params.stereo_out, draw_flags);
|
|
|
|
|
|
|
|
p->gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
for (int n = 0; n < MAX_OSD_PARTS; n++) {
|
|
|
|
enum sub_bitmap_format fmt = mpgl_osd_get_part_format(p->osd, n);
|
|
|
|
if (!fmt)
|
|
|
|
continue;
|
|
|
|
gl_sc_uniform_sampler(p->sc, "osdtex", GL_TEXTURE_2D, 0);
|
|
|
|
switch (fmt) {
|
|
|
|
case SUBBITMAP_RGBA: {
|
|
|
|
GLSLF("// OSD (RGBA)\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = texture(osdtex, texcoord).bgra;)
|
2015-03-23 01:42:19 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
case SUBBITMAP_LIBASS: {
|
|
|
|
GLSLF("// OSD (libass)\n");
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color =
|
2015-03-23 01:42:19 +00:00
|
|
|
vec4(ass_color.rgb, ass_color.a * texture(osdtex, texcoord).r);)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
abort();
|
|
|
|
}
|
2015-03-29 04:34:34 +00:00
|
|
|
// Subtitle color management, they're assumed to be sRGB by default
|
|
|
|
if (cms)
|
2015-03-27 10:12:46 +00:00
|
|
|
pass_colormanage(p, MP_CSP_PRIM_BT_709, MP_CSP_TRC_SRGB);
|
2015-03-23 01:42:19 +00:00
|
|
|
gl_sc_set_vao(p->sc, mpgl_osd_get_vao(p->osd));
|
|
|
|
gl_sc_gen_shader_and_reset(p->sc);
|
|
|
|
mpgl_osd_draw_part(p->osd, vp_w, vp_h, n);
|
|
|
|
}
|
|
|
|
gl_sc_set_vao(p->sc, &p->vao);
|
|
|
|
}
|
|
|
|
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
// Minimal rendering code path, for GLES or OpenGL 2.1 without proper FBOs.
|
|
|
|
static void pass_render_frame_dumb(struct gl_video *p, int fbo)
|
|
|
|
{
|
|
|
|
p->gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex tex[4];
|
|
|
|
pass_get_img_tex(p, &p->image, tex);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
|
|
|
|
struct gl_transform transform;
|
2016-03-28 14:30:48 +00:00
|
|
|
compute_src_transform(p, &transform);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
|
|
|
|
struct gl_transform tchroma = transform;
|
|
|
|
tchroma.t[0] /= 1 << p->image_desc.chroma_xs;
|
|
|
|
tchroma.t[1] /= 1 << p->image_desc.chroma_ys;
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
int index = 0;
|
|
|
|
for (int i = 0; i < p->plane_count; i++) {
|
|
|
|
gl_transform_trans(tex[i].type == PLANE_CHROMA ? tchroma : transform,
|
|
|
|
&tex[i].transform);
|
|
|
|
copy_img_tex(p, &index, tex[i]);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
pass_convert_yuv(p);
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// The main rendering function, takes care of everything up to and including
|
2015-07-17 21:21:04 +00:00
|
|
|
// upscaling. p->image is rendered.
|
|
|
|
static void pass_render_frame(struct gl_video *p)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
2015-10-23 17:52:03 +00:00
|
|
|
// initialize the texture parameters
|
|
|
|
p->texture_w = p->image_params.w;
|
|
|
|
p->texture_h = p->image_params.h;
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
p->texture_offset = identity_trans;
|
2016-03-05 11:38:51 +00:00
|
|
|
p->components = 0;
|
2015-10-23 17:52:03 +00:00
|
|
|
|
2016-04-08 20:21:31 +00:00
|
|
|
if (p->image_params.rotate % 180 == 90)
|
|
|
|
MPSWAP(int, p->texture_w, p->texture_h);
|
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
if (p->dumb_mode)
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
return;
|
|
|
|
|
2015-04-02 09:13:51 +00:00
|
|
|
p->use_linear = p->opts.linear_scaling || p->opts.sigmoid_upscaling;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
pass_read_video(p);
|
|
|
|
pass_convert_yuv(p);
|
2015-04-10 20:19:44 +00:00
|
|
|
|
|
|
|
// For subtitles
|
|
|
|
double vpts = p->image.mpi->pts;
|
|
|
|
if (vpts == MP_NOPTS_VALUE)
|
|
|
|
vpts = p->osd_pts;
|
|
|
|
|
2015-04-11 17:16:34 +00:00
|
|
|
if (p->osd && p->opts.blend_subs == 2) {
|
2015-04-11 13:53:00 +00:00
|
|
|
double scale[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, false, scale);
|
2015-04-11 13:53:00 +00:00
|
|
|
struct mp_osd_res rect = {
|
2015-10-23 17:52:03 +00:00
|
|
|
.w = p->texture_w, .h = p->texture_h,
|
2015-04-11 13:53:00 +00:00
|
|
|
.display_par = scale[1] / scale[0], // counter compensate scaling
|
|
|
|
};
|
2016-04-05 18:58:22 +00:00
|
|
|
finish_pass_fbo(p, &p->blend_subs_fbo, rect.w, rect.h, 0);
|
2015-10-23 17:52:03 +00:00
|
|
|
pass_draw_osd(p, OSD_DRAW_SUB_ONLY, vpts, rect,
|
2016-04-05 18:58:22 +00:00
|
|
|
rect.w, rect.h, p->blend_subs_fbo.fbo, false);
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = texture(texture0, texcoord0);)
|
2016-03-22 12:34:39 +00:00
|
|
|
pass_read_fbo(p, &p->blend_subs_fbo);
|
2015-04-10 20:19:44 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
apply_shaders(p, p->opts.pre_shaders, p->texture_w, p->texture_h, p->pre_fbo);
|
2015-03-27 12:27:40 +00:00
|
|
|
|
2015-09-23 20:43:27 +00:00
|
|
|
if (p->opts.unsharp != 0.0) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &p->unsharp_fbo, p->texture_w, p->texture_h, 0);
|
2016-03-16 18:09:52 +00:00
|
|
|
int id = pass_read_fbo(p, &p->unsharp_fbo);
|
|
|
|
pass_sample_unsharp(p->sc, id, p->opts.unsharp);
|
2015-09-23 20:43:27 +00:00
|
|
|
}
|
|
|
|
|
2015-03-15 21:52:34 +00:00
|
|
|
pass_scale_main(p);
|
2015-03-23 01:42:19 +00:00
|
|
|
|
2015-03-27 12:27:40 +00:00
|
|
|
int vp_w = p->dst_rect.x1 - p->dst_rect.x0,
|
|
|
|
vp_h = p->dst_rect.y1 - p->dst_rect.y0;
|
2015-04-11 17:16:34 +00:00
|
|
|
if (p->osd && p->opts.blend_subs == 1) {
|
2015-03-23 01:42:19 +00:00
|
|
|
// Recreate the real video size from the src/dst rects
|
|
|
|
struct mp_osd_res rect = {
|
|
|
|
.w = vp_w, .h = vp_h,
|
2015-10-23 17:52:03 +00:00
|
|
|
.ml = -p->src_rect.x0, .mr = p->src_rect.x1 - p->image_params.w,
|
|
|
|
.mt = -p->src_rect.y0, .mb = p->src_rect.y1 - p->image_params.h,
|
2015-03-23 01:42:19 +00:00
|
|
|
.display_par = 1.0,
|
|
|
|
};
|
|
|
|
// Adjust margins for scale
|
|
|
|
double scale[2];
|
2016-04-08 20:21:31 +00:00
|
|
|
get_scale_factors(p, true, scale);
|
2015-03-23 01:42:19 +00:00
|
|
|
rect.ml *= scale[0]; rect.mr *= scale[0];
|
|
|
|
rect.mt *= scale[1]; rect.mb *= scale[1];
|
2015-03-29 04:34:34 +00:00
|
|
|
// We should always blend subtitles in non-linear light
|
|
|
|
if (p->use_linear)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_delinearize(p->sc, p->image_params.gamma);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
finish_pass_fbo(p, &p->blend_subs_fbo, p->texture_w, p->texture_h,
|
2015-10-23 17:52:03 +00:00
|
|
|
FBOTEX_FUZZY);
|
|
|
|
pass_draw_osd(p, OSD_DRAW_SUB_ONLY, vpts, rect,
|
|
|
|
p->texture_w, p->texture_h, p->blend_subs_fbo.fbo, false);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
pass_read_fbo(p, &p->blend_subs_fbo);
|
2015-03-29 04:34:34 +00:00
|
|
|
if (p->use_linear)
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_linearize(p->sc, p->image_params.gamma);
|
2015-03-23 01:42:19 +00:00
|
|
|
}
|
2015-03-27 12:27:40 +00:00
|
|
|
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
apply_shaders(p, p->opts.post_shaders, p->texture_w, p->texture_h, p->post_fbo);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void pass_draw_to_screen(struct gl_video *p, int fbo)
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
{
|
2015-11-19 20:22:24 +00:00
|
|
|
if (p->dumb_mode)
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
pass_render_frame_dumb(p, fbo);
|
|
|
|
|
2015-03-27 10:12:46 +00:00
|
|
|
// Adjust the overall gamma before drawing to screen
|
|
|
|
if (p->user_gamma != 1) {
|
|
|
|
gl_sc_uniform_f(p->sc, "user_gamma", p->user_gamma);
|
|
|
|
GLSL(color.rgb = clamp(color.rgb, 0.0, 1.0);)
|
|
|
|
GLSL(color.rgb = pow(color.rgb, vec3(user_gamma));)
|
|
|
|
}
|
2016-03-29 20:26:24 +00:00
|
|
|
|
2015-03-27 10:12:46 +00:00
|
|
|
pass_colormanage(p, p->image_params.primaries,
|
|
|
|
p->use_linear ? MP_CSP_TRC_LINEAR : p->image_params.gamma);
|
2016-03-29 20:26:24 +00:00
|
|
|
|
|
|
|
// Draw checkerboard pattern to indicate transparency
|
|
|
|
if (p->has_alpha && p->opts.alpha_mode == 3) {
|
|
|
|
GLSLF("// transparency checkerboard\n");
|
|
|
|
GLSL(bvec2 tile = lessThan(fract(gl_FragCoord.xy / 32.0), vec2(0.5));)
|
|
|
|
GLSL(vec3 background = vec3(tile.x == tile.y ? 1.0 : 0.75);)
|
|
|
|
GLSL(color.rgb = mix(background, color.rgb, color.a);)
|
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
pass_dither(p);
|
2016-03-28 14:30:48 +00:00
|
|
|
finish_pass_direct(p, fbo, p->vp_w, p->vp_h, &p->dst_rect);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Draws an interpolate frame to fbo, based on the frame timing in t
|
2015-07-01 17:24:28 +00:00
|
|
|
static void gl_video_interpolate_frame(struct gl_video *p, struct vo_frame *t,
|
|
|
|
int fbo)
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
{
|
|
|
|
int vp_w = p->dst_rect.x1 - p->dst_rect.x0,
|
2015-03-25 22:06:46 +00:00
|
|
|
vp_h = p->dst_rect.y1 - p->dst_rect.y0;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
2015-06-30 23:25:30 +00:00
|
|
|
// Reset the queue completely if this is a still image, to avoid any
|
|
|
|
// interpolation artifacts from surrounding frames when unpausing or
|
|
|
|
// framestepping
|
|
|
|
if (t->still)
|
|
|
|
gl_video_reset_surfaces(p);
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// First of all, figure out if we have a frame availble at all, and draw
|
|
|
|
// it manually + reset the queue if not
|
2015-06-26 08:59:57 +00:00
|
|
|
if (p->surfaces[p->surface_now].pts == MP_NOPTS_VALUE) {
|
2015-07-17 21:21:04 +00:00
|
|
|
gl_video_upload_image(p, t->current);
|
|
|
|
pass_render_frame(p);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
finish_pass_fbo(p, &p->surfaces[p->surface_now].fbotex,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
vp_w, vp_h, FBOTEX_FUZZY);
|
2015-06-26 08:59:57 +00:00
|
|
|
p->surfaces[p->surface_now].pts = p->image.mpi->pts;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->surface_idx = p->surface_now;
|
|
|
|
}
|
|
|
|
|
2015-06-26 08:59:57 +00:00
|
|
|
// Find the right frame for this instant
|
2015-07-01 17:24:28 +00:00
|
|
|
if (t->current&& t->current->pts != MP_NOPTS_VALUE) {
|
2015-06-26 08:59:57 +00:00
|
|
|
int next = fbosurface_wrap(p->surface_now + 1);
|
|
|
|
while (p->surfaces[next].pts != MP_NOPTS_VALUE &&
|
|
|
|
p->surfaces[next].pts > p->surfaces[p->surface_now].pts &&
|
2015-07-01 17:24:28 +00:00
|
|
|
p->surfaces[p->surface_now].pts < t->current->pts)
|
2015-06-26 08:59:57 +00:00
|
|
|
{
|
|
|
|
p->surface_now = next;
|
|
|
|
next = fbosurface_wrap(next + 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Figure out the queue size. For illustration, a filter radius of 2 would
|
|
|
|
// look like this: _ A [B] C D _
|
2015-11-28 14:45:35 +00:00
|
|
|
// A is surface_bse, B is surface_now, C is surface_now+1 and D is
|
2015-03-13 18:30:31 +00:00
|
|
|
// surface_end.
|
2016-03-05 08:42:57 +00:00
|
|
|
struct scaler *tscale = &p->scaler[SCALER_TSCALE];
|
|
|
|
reinit_scaler(p, tscale, &p->opts.scaler[SCALER_TSCALE], 1, tscale_sizes);
|
2015-07-11 11:55:45 +00:00
|
|
|
bool oversample = strcmp(tscale->conf.kernel.name, "oversample") == 0;
|
2015-03-15 06:11:51 +00:00
|
|
|
int size;
|
2015-03-13 18:30:31 +00:00
|
|
|
|
2015-07-11 11:55:45 +00:00
|
|
|
if (oversample) {
|
|
|
|
size = 2;
|
|
|
|
} else {
|
|
|
|
assert(tscale->kernel && !tscale->kernel->polar);
|
|
|
|
size = ceil(tscale->kernel->size);
|
|
|
|
assert(size <= TEXUNIT_VIDEO_NUM);
|
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
|
|
|
int radius = size/2;
|
2015-03-13 18:30:31 +00:00
|
|
|
int surface_now = p->surface_now;
|
|
|
|
int surface_bse = fbosurface_wrap(surface_now - (radius-1));
|
|
|
|
int surface_end = fbosurface_wrap(surface_now + radius);
|
|
|
|
assert(fbosurface_wrap(surface_bse + size-1) == surface_end);
|
|
|
|
|
2015-06-26 08:59:57 +00:00
|
|
|
// Render new frames while there's room in the queue. Note that technically,
|
|
|
|
// this should be done before the step where we find the right frame, but
|
|
|
|
// it only barely matters at the very beginning of playback, and this way
|
|
|
|
// makes the code much more linear.
|
2015-03-13 18:30:31 +00:00
|
|
|
int surface_dst = fbosurface_wrap(p->surface_idx+1);
|
2015-07-01 17:24:28 +00:00
|
|
|
for (int i = 0; i < t->num_frames; i++) {
|
2015-06-26 08:59:57 +00:00
|
|
|
// Avoid overwriting data we might still need
|
|
|
|
if (surface_dst == surface_bse - 1)
|
|
|
|
break;
|
|
|
|
|
2015-07-01 17:24:28 +00:00
|
|
|
struct mp_image *f = t->frames[i];
|
2015-07-15 12:58:56 +00:00
|
|
|
if (!mp_image_params_equal(&f->params, &p->real_image_params) ||
|
|
|
|
f->pts == MP_NOPTS_VALUE)
|
2015-06-26 08:59:57 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (f->pts > p->surfaces[p->surface_idx].pts) {
|
2015-07-17 21:21:04 +00:00
|
|
|
gl_video_upload_image(p, f);
|
|
|
|
pass_render_frame(p);
|
2015-06-26 08:59:57 +00:00
|
|
|
finish_pass_fbo(p, &p->surfaces[surface_dst].fbotex,
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
vp_w, vp_h, FBOTEX_FUZZY);
|
2015-06-26 08:59:57 +00:00
|
|
|
p->surfaces[surface_dst].pts = f->pts;
|
|
|
|
p->surface_idx = surface_dst;
|
|
|
|
surface_dst = fbosurface_wrap(surface_dst+1);
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Figure out whether the queue is "valid". A queue is invalid if the
|
|
|
|
// frames' PTS is not monotonically increasing. Anything else is invalid,
|
|
|
|
// so avoid blending incorrect data and just draw the latest frame as-is.
|
|
|
|
// Possible causes for failure of this condition include seeks, pausing,
|
|
|
|
// end of playback or start of playback.
|
|
|
|
bool valid = true;
|
2015-03-15 22:25:01 +00:00
|
|
|
for (int i = surface_bse, ii; valid && i != surface_end; i = ii) {
|
|
|
|
ii = fbosurface_wrap(i+1);
|
2015-06-26 08:59:57 +00:00
|
|
|
if (p->surfaces[i].pts == MP_NOPTS_VALUE ||
|
|
|
|
p->surfaces[ii].pts == MP_NOPTS_VALUE)
|
|
|
|
{
|
2015-03-13 18:30:31 +00:00
|
|
|
valid = false;
|
2015-03-15 22:25:01 +00:00
|
|
|
} else if (p->surfaces[ii].pts < p->surfaces[i].pts) {
|
|
|
|
valid = false;
|
|
|
|
MP_DBG(p, "interpolation queue underrun\n");
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-25 21:40:10 +00:00
|
|
|
// Update OSD PTS to synchronize subtitles with the displayed frame
|
2015-06-26 08:59:57 +00:00
|
|
|
p->osd_pts = p->surfaces[surface_now].pts;
|
2015-03-25 21:40:10 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// Finally, draw the right mix of frames to the screen.
|
2015-06-30 23:25:30 +00:00
|
|
|
if (!valid || t->still) {
|
2015-03-13 18:30:31 +00:00
|
|
|
// surface_now is guaranteed to be valid, so we can safely use it.
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
pass_read_fbo(p, &p->surfaces[surface_now].fbotex);
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
p->is_interpolated = false;
|
|
|
|
} else {
|
2015-11-28 14:45:35 +00:00
|
|
|
double mix = t->vsync_offset / t->ideal_frame_duration;
|
2015-06-26 08:59:57 +00:00
|
|
|
// The scaler code always wants the fcoord to be between 0 and 1,
|
|
|
|
// so we try to adjust by using the previous set of N frames instead
|
|
|
|
// (which requires some extra checking to make sure it's valid)
|
|
|
|
if (mix < 0.0) {
|
|
|
|
int prev = fbosurface_wrap(surface_bse - 1);
|
|
|
|
if (p->surfaces[prev].pts != MP_NOPTS_VALUE &&
|
|
|
|
p->surfaces[prev].pts < p->surfaces[surface_bse].pts)
|
|
|
|
{
|
|
|
|
mix += 1.0;
|
|
|
|
surface_bse = prev;
|
|
|
|
} else {
|
|
|
|
mix = 0.0; // at least don't blow up, this should only
|
|
|
|
// ever happen at the start of playback
|
|
|
|
}
|
2015-03-15 06:11:51 +00:00
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
2015-07-11 11:55:45 +00:00
|
|
|
// Blend the frames together
|
|
|
|
if (oversample) {
|
2015-11-28 14:45:35 +00:00
|
|
|
double vsync_dist = t->vsync_interval / t->ideal_frame_duration,
|
2015-07-11 11:55:45 +00:00
|
|
|
threshold = tscale->conf.kernel.params[0];
|
|
|
|
threshold = isnan(threshold) ? 0.0 : threshold;
|
|
|
|
mix = (1 - mix) / vsync_dist;
|
|
|
|
mix = mix <= 0 + threshold ? 0 : mix;
|
|
|
|
mix = mix >= 1 - threshold ? 1 : mix;
|
|
|
|
mix = 1 - mix;
|
|
|
|
gl_sc_uniform_f(p->sc, "inter_coeff", mix);
|
2016-02-23 15:18:17 +00:00
|
|
|
GLSL(color = mix(texture(texture0, texcoord0),
|
|
|
|
texture(texture1, texcoord1),
|
|
|
|
inter_coeff);)
|
2015-07-11 11:55:45 +00:00
|
|
|
} else {
|
|
|
|
gl_sc_uniform_f(p->sc, "fcoord", mix);
|
2015-09-05 12:03:00 +00:00
|
|
|
pass_sample_separated_gen(p->sc, tscale, 0, 0);
|
2015-07-11 11:55:45 +00:00
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
|
|
|
// Load all the required frames
|
2015-03-13 18:30:31 +00:00
|
|
|
for (int i = 0; i < size; i++) {
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
struct img_tex img =
|
|
|
|
img_tex_fbo(&p->surfaces[fbosurface_wrap(surface_bse+i)].fbotex,
|
2016-03-05 11:38:51 +00:00
|
|
|
identity_trans, PLANE_RGB, p->components);
|
vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
2016-03-05 10:29:19 +00:00
|
|
|
// Since the code in pass_sample_separated currently assumes
|
|
|
|
// the textures are bound in-order and starting at 0, we just
|
|
|
|
// assert to make sure this is the case (which it should always be)
|
|
|
|
int id = pass_bind(p, img);
|
|
|
|
assert(id == i);
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
2015-06-26 08:59:57 +00:00
|
|
|
|
2015-11-28 14:45:35 +00:00
|
|
|
MP_DBG(p, "inter frame dur: %f vsync: %f, mix: %f\n",
|
|
|
|
t->ideal_frame_duration, t->vsync_interval, mix);
|
2015-03-13 18:30:31 +00:00
|
|
|
p->is_interpolated = true;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
pass_draw_to_screen(p, fbo);
|
2015-07-02 11:17:20 +00:00
|
|
|
|
|
|
|
p->frames_drawn += 1;
|
2014-11-23 19:06:05 +00:00
|
|
|
}
|
|
|
|
|
2015-03-23 15:28:33 +00:00
|
|
|
// (fbo==0 makes BindFramebuffer select the screen backbuffer)
|
2015-07-01 17:24:28 +00:00
|
|
|
void gl_video_render_frame(struct gl_video *p, struct vo_frame *frame, int fbo)
|
2015-03-23 15:28:33 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
|
2015-07-01 17:24:28 +00:00
|
|
|
bool has_frame = frame->current || vimg->mpi;
|
|
|
|
|
|
|
|
if (!has_frame || p->dst_rect.x0 > 0 || p->dst_rect.y0 > 0 ||
|
2015-03-23 15:28:33 +00:00
|
|
|
p->dst_rect.x1 < p->vp_w || p->dst_rect.y1 < abs(p->vp_h))
|
|
|
|
{
|
|
|
|
struct m_color c = p->opts.background;
|
|
|
|
gl->ClearColor(c.r / 255.0, c.g / 255.0, c.b / 255.0, c.a / 255.0);
|
|
|
|
gl->Clear(GL_COLOR_BUFFER_BIT);
|
|
|
|
}
|
|
|
|
|
2015-07-01 17:24:28 +00:00
|
|
|
if (has_frame) {
|
|
|
|
gl_sc_set_vao(p->sc, &p->vao);
|
2015-03-23 15:28:33 +00:00
|
|
|
|
2016-01-27 20:07:17 +00:00
|
|
|
bool interpolate = p->opts.interpolation && frame->display_synced &&
|
|
|
|
(p->frames_drawn || !frame->still);
|
|
|
|
if (interpolate) {
|
|
|
|
double ratio = frame->ideal_frame_duration / frame->vsync_interval;
|
|
|
|
if (fabs(ratio - 1.0) < p->opts.interpolation_threshold)
|
|
|
|
interpolate = false;
|
|
|
|
}
|
2016-01-25 20:46:01 +00:00
|
|
|
|
2016-01-27 20:07:17 +00:00
|
|
|
if (interpolate) {
|
2015-07-01 17:24:28 +00:00
|
|
|
gl_video_interpolate_frame(p, frame, fbo);
|
|
|
|
} else {
|
2015-09-05 10:02:02 +00:00
|
|
|
bool is_new = !frame->redraw && !frame->repeat;
|
|
|
|
if (is_new || !p->output_fbo_valid) {
|
2015-11-15 17:30:54 +00:00
|
|
|
p->output_fbo_valid = false;
|
|
|
|
|
2015-07-17 21:21:04 +00:00
|
|
|
gl_video_upload_image(p, frame->current);
|
2015-09-05 10:02:02 +00:00
|
|
|
pass_render_frame(p);
|
|
|
|
|
2015-11-15 17:30:54 +00:00
|
|
|
// For the non-interplation case, we draw to a single "cache"
|
|
|
|
// FBO to speed up subsequent re-draws (if any exist)
|
|
|
|
int dest_fbo = fbo;
|
|
|
|
if (frame->num_vsyncs > 1 && frame->display_synced &&
|
2015-11-19 20:22:24 +00:00
|
|
|
!p->dumb_mode && gl->BlitFramebuffer)
|
2015-10-30 11:53:43 +00:00
|
|
|
{
|
2015-11-15 17:30:54 +00:00
|
|
|
fbotex_change(&p->output_fbo, p->gl, p->log,
|
|
|
|
p->vp_w, abs(p->vp_h),
|
|
|
|
p->opts.fbo_format, 0);
|
|
|
|
dest_fbo = p->output_fbo.fbo;
|
2015-09-05 10:02:02 +00:00
|
|
|
p->output_fbo_valid = true;
|
|
|
|
}
|
2015-11-15 17:30:54 +00:00
|
|
|
pass_draw_to_screen(p, dest_fbo);
|
2015-09-05 10:02:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// "output fbo valid" and "output fbo needed" are equivalent
|
|
|
|
if (p->output_fbo_valid) {
|
2015-11-15 17:30:54 +00:00
|
|
|
gl->BindFramebuffer(GL_READ_FRAMEBUFFER, p->output_fbo.fbo);
|
|
|
|
gl->BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
|
|
|
|
struct mp_rect rc = p->dst_rect;
|
|
|
|
if (p->vp_h < 0) {
|
|
|
|
rc.y1 = -p->vp_h - p->dst_rect.y0;
|
|
|
|
rc.y0 = -p->vp_h - p->dst_rect.y1;
|
|
|
|
}
|
|
|
|
gl->BlitFramebuffer(rc.x0, rc.y0, rc.x1, rc.y1,
|
|
|
|
rc.x0, rc.y0, rc.x1, rc.y1,
|
|
|
|
GL_COLOR_BUFFER_BIT, GL_NEAREST);
|
|
|
|
gl->BindFramebuffer(GL_READ_FRAMEBUFFER, 0);
|
|
|
|
gl->BindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
|
2015-09-05 10:02:02 +00:00
|
|
|
}
|
2015-07-01 17:24:28 +00:00
|
|
|
}
|
2015-03-23 15:28:33 +00:00
|
|
|
}
|
|
|
|
|
2015-06-26 08:59:57 +00:00
|
|
|
debug_check_gl(p, "after video rendering");
|
|
|
|
|
2015-03-23 15:28:33 +00:00
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo);
|
|
|
|
|
2015-03-23 01:42:19 +00:00
|
|
|
if (p->osd) {
|
|
|
|
pass_draw_osd(p, p->opts.blend_subs ? OSD_DRAW_OSD_ONLY : 0,
|
2015-03-29 04:34:34 +00:00
|
|
|
p->osd_pts, p->osd_rect, p->vp_w, p->vp_h, fbo, true);
|
2015-03-23 01:42:19 +00:00
|
|
|
debug_check_gl(p, "after OSD rendering");
|
|
|
|
}
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
|
|
|
|
gl->UseProgram(0);
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, 0);
|
|
|
|
|
2015-11-10 13:36:23 +00:00
|
|
|
// The playloop calls this last before waiting some time until it decides
|
|
|
|
// to call flip_page(). Tell OpenGL to start execution of the GPU commands
|
|
|
|
// while we sleep (this happens asynchronously).
|
|
|
|
gl->Flush();
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
p->frames_rendered++;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
// vp_w/vp_h is the implicit size of the target framebuffer.
|
|
|
|
// vp_h can be negative to flip the screen.
|
|
|
|
void gl_video_resize(struct gl_video *p, int vp_w, int vp_h,
|
2013-03-01 20:19:20 +00:00
|
|
|
struct mp_rect *src, struct mp_rect *dst,
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
struct mp_osd_res *osd)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
p->src_rect = *src;
|
|
|
|
p->dst_rect = *dst;
|
|
|
|
p->osd_rect = *osd;
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
p->vp_w = vp_w;
|
|
|
|
p->vp_h = vp_h;
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
|
|
|
|
gl_video_reset_surfaces(p);
|
2016-03-21 21:23:41 +00:00
|
|
|
|
2016-03-23 13:49:39 +00:00
|
|
|
if (p->osd)
|
|
|
|
mpgl_osd_resize(p->osd, p->osd_rect, p->image_params.stereo_out);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-09-02 20:45:07 +00:00
|
|
|
static bool unmap_image(struct gl_video *p, struct mp_image *mpi)
|
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
bool ok = true;
|
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
|
|
|
struct texplane *plane = &vimg->planes[n];
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
ok = gl->UnmapBuffer(GL_PIXEL_UNPACK_BUFFER) && ok;
|
|
|
|
mpi->planes[n] = NULL; // PBO offset 0
|
|
|
|
}
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
|
|
|
return ok;
|
|
|
|
}
|
|
|
|
|
2015-09-02 10:39:19 +00:00
|
|
|
static bool map_image(struct gl_video *p, struct mp_image *mpi)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
if (!p->opts.pbo)
|
|
|
|
return false;
|
|
|
|
|
2013-03-28 19:40:19 +00:00
|
|
|
struct video_image *vimg = &p->image;
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
2013-03-28 19:40:19 +00:00
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2015-04-10 18:58:26 +00:00
|
|
|
mpi->stride[n] = mp_image_plane_w(mpi, n) * p->image_desc.bytes[n];
|
2015-09-02 20:45:07 +00:00
|
|
|
if (!plane->gl_buffer) {
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->GenBuffers(1, &plane->gl_buffer);
|
2015-09-02 20:45:07 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
size_t buffer_size = mp_image_plane_h(mpi, n) * mpi->stride[n];
|
|
|
|
gl->BufferData(GL_PIXEL_UNPACK_BUFFER, buffer_size,
|
2013-03-01 20:19:20 +00:00
|
|
|
NULL, GL_DYNAMIC_DRAW);
|
|
|
|
}
|
2015-09-02 20:45:07 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
mpi->planes[n] = gl->MapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
2015-09-02 20:45:07 +00:00
|
|
|
if (!mpi->planes[n]) {
|
|
|
|
unmap_image(p, mpi);
|
|
|
|
return false;
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2015-09-02 10:44:46 +00:00
|
|
|
memset(mpi->bufs, 0, sizeof(mpi->bufs));
|
2013-03-01 20:19:20 +00:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-07-15 10:22:49 +00:00
|
|
|
static void gl_video_upload_image(struct gl_video *p, struct mp_image *mpi)
|
2015-03-22 00:32:03 +00:00
|
|
|
{
|
2015-07-15 10:22:49 +00:00
|
|
|
GL *gl = p->gl;
|
2015-03-22 00:32:03 +00:00
|
|
|
struct video_image *vimg = &p->image;
|
2015-05-01 16:44:45 +00:00
|
|
|
|
2015-07-15 10:22:49 +00:00
|
|
|
mpi = mp_image_new_ref(mpi);
|
|
|
|
if (!mpi)
|
2015-06-26 08:59:57 +00:00
|
|
|
abort();
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
|
|
|
|
2015-07-15 10:22:49 +00:00
|
|
|
vimg->mpi = mpi;
|
2015-05-01 16:44:45 +00:00
|
|
|
p->osd_pts = mpi->pts;
|
2015-07-15 10:22:49 +00:00
|
|
|
p->frames_uploaded++;
|
2013-03-28 19:40:19 +00:00
|
|
|
|
2015-07-26 18:13:53 +00:00
|
|
|
if (p->hwdec_active) {
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
struct gl_hwdec_frame gl_frame = {0};
|
|
|
|
bool ok = p->hwdec->driver->map_frame(p->hwdec, vimg->mpi, &gl_frame) >= 0;
|
|
|
|
vimg->hwdec_mapped = true;
|
|
|
|
if (ok) {
|
|
|
|
struct mp_image layout = {0};
|
|
|
|
mp_image_set_params(&layout, &p->image_params);
|
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
|
|
|
struct gl_hwdec_plane *plane = &gl_frame.planes[n];
|
|
|
|
vimg->planes[n] = (struct texplane){
|
|
|
|
.w = mp_image_plane_w(&layout, n),
|
|
|
|
.h = mp_image_plane_h(&layout, n),
|
|
|
|
.tex_w = plane->tex_w,
|
|
|
|
.tex_h = plane->tex_h,
|
|
|
|
.gl_target = plane->gl_target,
|
|
|
|
.gl_texture = plane->gl_texture,
|
|
|
|
};
|
2016-05-10 19:11:58 +00:00
|
|
|
snprintf(vimg->planes[n].swizzle, sizeof(vimg->planes[n].swizzle),
|
|
|
|
"%s", plane->swizzle);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
}
|
|
|
|
} else {
|
2016-04-27 11:32:20 +00:00
|
|
|
MP_FATAL(p, "Mapping hardware decoded surface failed.\n");
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
|
|
|
}
|
2013-11-03 23:00:18 +00:00
|
|
|
return;
|
2015-07-26 18:13:53 +00:00
|
|
|
}
|
2013-11-03 23:00:18 +00:00
|
|
|
|
|
|
|
assert(mpi->num_planes == p->plane_count);
|
|
|
|
|
2015-09-02 20:45:07 +00:00
|
|
|
mp_image_t pbo_mpi = *mpi;
|
|
|
|
bool pbo = map_image(p, &pbo_mpi);
|
|
|
|
if (pbo) {
|
|
|
|
mp_image_copy(&pbo_mpi, mpi);
|
|
|
|
if (unmap_image(p, &pbo_mpi)) {
|
|
|
|
mpi = &pbo_mpi;
|
|
|
|
} else {
|
|
|
|
MP_FATAL(p, "Video PBO upload failed. Disabling PBOs.\n");
|
|
|
|
pbo = false;
|
|
|
|
p->opts.pbo = 0;
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2015-09-02 20:45:07 +00:00
|
|
|
|
|
|
|
vimg->image_flipped = mpi->stride[0] < 0;
|
2013-11-03 23:00:18 +00:00
|
|
|
for (int n = 0; n < p->plane_count; n++) {
|
2013-03-28 19:40:19 +00:00
|
|
|
struct texplane *plane = &vimg->planes[n];
|
2015-09-02 20:45:07 +00:00
|
|
|
if (pbo)
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, plane->gl_buffer);
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0 + n);
|
2016-02-18 09:46:03 +00:00
|
|
|
gl->BindTexture(plane->gl_target, plane->gl_texture);
|
|
|
|
glUploadTex(gl, plane->gl_target, plane->gl_format, plane->gl_type,
|
2015-09-02 20:45:07 +00:00
|
|
|
mpi->planes[n], mpi->stride[n], 0, 0, plane->w, plane->h, 0);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2014-12-20 16:22:36 +00:00
|
|
|
if (pbo)
|
|
|
|
gl->BindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-09-05 09:39:20 +00:00
|
|
|
static bool test_fbo(struct gl_video *p)
|
2013-05-30 13:37:13 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
2015-09-05 09:39:20 +00:00
|
|
|
bool success = false;
|
2015-02-26 09:35:49 +00:00
|
|
|
MP_VERBOSE(p, "Testing user-set FBO format (0x%x)\n",
|
|
|
|
(unsigned)p->opts.fbo_format);
|
2013-05-30 13:37:13 +00:00
|
|
|
struct fbotex fbo = {0};
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
if (fbotex_init(&fbo, p->gl, p->log, 16, 16, p->opts.fbo_format)) {
|
2013-05-30 13:37:13 +00:00
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, fbo.fbo);
|
|
|
|
gl->BindFramebuffer(GL_FRAMEBUFFER, 0);
|
2015-09-05 09:39:20 +00:00
|
|
|
success = true;
|
2013-05-30 13:37:13 +00:00
|
|
|
}
|
2015-01-29 13:58:26 +00:00
|
|
|
fbotex_uninit(&fbo);
|
2013-09-11 22:57:32 +00:00
|
|
|
glCheckError(gl, p->log, "FBO test");
|
2015-09-05 09:39:20 +00:00
|
|
|
return success;
|
2013-05-30 13:37:13 +00:00
|
|
|
}
|
|
|
|
|
2015-11-19 20:22:24 +00:00
|
|
|
// Return whether dumb-mode can be used without disabling any features.
|
|
|
|
// Essentially, vo_opengl with mostly default settings will return true.
|
|
|
|
static bool check_dumb_mode(struct gl_video *p)
|
|
|
|
{
|
|
|
|
struct gl_video_opts *o = &p->opts;
|
2016-01-26 19:47:32 +00:00
|
|
|
if (p->use_integer_conversion)
|
|
|
|
return false;
|
2015-11-19 20:22:24 +00:00
|
|
|
if (o->dumb_mode)
|
|
|
|
return true;
|
|
|
|
if (o->target_prim || o->target_trc || o->linear_scaling ||
|
|
|
|
o->correct_downscaling || o->sigmoid_upscaling || o->interpolation ||
|
2016-03-05 11:02:01 +00:00
|
|
|
o->blend_subs || o->deband || o->unsharp || o->prescale_luma)
|
2015-11-19 20:22:24 +00:00
|
|
|
return false;
|
2016-03-05 08:42:57 +00:00
|
|
|
// check remaining scalers (tscale is already implicitly excluded above)
|
|
|
|
for (int i = 0; i < SCALER_COUNT; i++) {
|
|
|
|
if (i != SCALER_TSCALE) {
|
|
|
|
const char *name = o->scaler[i].kernel.name;
|
|
|
|
if (name && strcmp(name, "bilinear") != 0)
|
|
|
|
return false;
|
|
|
|
}
|
2015-11-19 20:22:24 +00:00
|
|
|
}
|
|
|
|
if (o->pre_shaders && o->pre_shaders[0])
|
|
|
|
return false;
|
|
|
|
if (o->post_shaders && o->post_shaders[0])
|
|
|
|
return false;
|
|
|
|
if (p->use_lut_3d)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
// Disable features that are not supported with the current OpenGL version.
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
static void check_gl_features(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
2016-05-12 18:08:49 +00:00
|
|
|
bool have_float_tex = !!gl_find_float16_format(gl, 1);
|
2014-12-23 01:48:58 +00:00
|
|
|
bool have_3d_tex = gl->mpgl_caps & MPGL_CAP_3D_TEX;
|
2014-03-05 14:01:32 +00:00
|
|
|
bool have_mix = gl->glsl_version >= 130;
|
2015-11-16 19:09:15 +00:00
|
|
|
bool have_texrg = gl->mpgl_caps & MPGL_CAP_TEX_RG;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
if (!p->opts.fbo_format) {
|
|
|
|
p->opts.fbo_format = GL_RGBA16;
|
|
|
|
if (gl->es && !(gl->mpgl_caps & MPGL_CAP_EXT16))
|
|
|
|
p->opts.fbo_format = have_float_tex ? GL_RGBA16F : GL_RGB10_A2;
|
2015-11-19 20:20:50 +00:00
|
|
|
}
|
2016-05-12 18:08:49 +00:00
|
|
|
bool have_fbo = test_fbo(p);
|
2015-11-19 20:20:50 +00:00
|
|
|
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
if (gl->es && p->opts.pbo) {
|
|
|
|
p->opts.pbo = 0;
|
|
|
|
MP_WARN(p, "Disabling PBOs (GLES unsupported).\n");
|
|
|
|
}
|
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
p->forced_dumb_mode = p->opts.dumb_mode || !have_fbo || !have_texrg;
|
2015-11-19 20:22:24 +00:00
|
|
|
bool voluntarily_dumb = check_dumb_mode(p);
|
2016-01-26 19:47:32 +00:00
|
|
|
if (p->forced_dumb_mode || voluntarily_dumb) {
|
2015-11-19 20:22:24 +00:00
|
|
|
if (voluntarily_dumb) {
|
|
|
|
MP_VERBOSE(p, "No advanced processing required. Enabling dumb mode.\n");
|
|
|
|
} else if (!p->opts.dumb_mode) {
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
MP_WARN(p, "High bit depth FBOs unsupported. Enabling dumb mode.\n"
|
|
|
|
"Most extended features will be disabled.\n");
|
|
|
|
}
|
2015-11-19 20:22:24 +00:00
|
|
|
p->dumb_mode = true;
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
p->use_lut_3d = false;
|
2015-09-08 20:55:01 +00:00
|
|
|
// Most things don't work, so whitelist all options that still work.
|
|
|
|
struct gl_video_opts new_opts = {
|
|
|
|
.gamma = p->opts.gamma,
|
|
|
|
.gamma_auto = p->opts.gamma_auto,
|
|
|
|
.pbo = p->opts.pbo,
|
|
|
|
.fbo_format = p->opts.fbo_format,
|
|
|
|
.alpha_mode = p->opts.alpha_mode,
|
|
|
|
.use_rectangle = p->opts.use_rectangle,
|
|
|
|
.background = p->opts.background,
|
|
|
|
.dither_algo = -1,
|
|
|
|
};
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++)
|
|
|
|
new_opts.scaler[n] = gl_video_opts_def.scaler[n];
|
2015-09-08 20:55:01 +00:00
|
|
|
assign_options(&p->opts, &new_opts);
|
2015-09-09 18:40:04 +00:00
|
|
|
p->opts.deband_opts = m_config_alloc_struct(NULL, &deband_conf);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
return;
|
2015-09-05 09:39:20 +00:00
|
|
|
}
|
2015-11-19 20:22:24 +00:00
|
|
|
p->dumb_mode = false;
|
2015-09-05 09:39:20 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
// Normally, we want to disable them by default if FBOs are unavailable,
|
|
|
|
// because they will be slow (not critically slow, but still slower).
|
|
|
|
// Without FP textures, we must always disable them.
|
2014-12-16 17:55:02 +00:00
|
|
|
// I don't know if luminance alpha float textures exist, so disregard them.
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const struct filter_kernel *kernel =
|
|
|
|
mp_find_filter_kernel(p->opts.scaler[n].kernel.name);
|
2015-02-26 09:35:49 +00:00
|
|
|
if (kernel) {
|
|
|
|
char *reason = NULL;
|
|
|
|
if (!have_float_tex)
|
2015-07-27 21:18:19 +00:00
|
|
|
reason = "(float tex. missing)";
|
2015-02-26 09:35:49 +00:00
|
|
|
if (reason) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
p->opts.scaler[n].kernel.name = "bilinear";
|
2015-07-27 21:18:19 +00:00
|
|
|
MP_WARN(p, "Disabling scaler #%d %s.\n", n, reason);
|
2016-05-12 17:34:02 +00:00
|
|
|
if (n == SCALER_TSCALE)
|
|
|
|
p->opts.interpolation = 0;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-19 00:03:08 +00:00
|
|
|
// GLES3 doesn't provide filtered 16 bit integer textures
|
|
|
|
// GLES2 doesn't even provide 3D textures
|
2016-02-13 14:33:00 +00:00
|
|
|
if (p->use_lut_3d && (!have_3d_tex || gl->es)) {
|
2014-12-17 20:48:23 +00:00
|
|
|
p->use_lut_3d = false;
|
2015-04-11 17:24:54 +00:00
|
|
|
MP_WARN(p, "Disabling color management (GLES unsupported).\n");
|
2014-12-17 20:48:23 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
int use_cms = p->opts.target_prim != MP_CSP_PRIM_AUTO ||
|
|
|
|
p->opts.target_trc != MP_CSP_TRC_AUTO || p->use_lut_3d;
|
2014-03-05 14:01:32 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
// mix() is needed for some gamma functions
|
|
|
|
if (!have_mix && (p->opts.linear_scaling || p->opts.sigmoid_upscaling)) {
|
|
|
|
p->opts.linear_scaling = false;
|
|
|
|
p->opts.sigmoid_upscaling = false;
|
2015-04-11 17:24:54 +00:00
|
|
|
MP_WARN(p, "Disabling linear/sigmoid scaling (GLSL version too old).\n");
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
}
|
|
|
|
if (!have_mix && use_cms) {
|
|
|
|
p->opts.target_prim = MP_CSP_PRIM_AUTO;
|
|
|
|
p->opts.target_trc = MP_CSP_TRC_AUTO;
|
|
|
|
p->use_lut_3d = false;
|
2015-04-11 17:24:54 +00:00
|
|
|
MP_WARN(p, "Disabling color management (GLSL version too old).\n");
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2015-10-01 18:44:39 +00:00
|
|
|
if (!have_mix && p->opts.deband) {
|
|
|
|
p->opts.deband = 0;
|
|
|
|
MP_WARN(p, "Disabling debanding (GLSL version too old).\n");
|
|
|
|
}
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
|
2016-03-05 11:02:01 +00:00
|
|
|
if (p->opts.prescale_luma == 2) {
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
if (p->opts.nnedi3_opts->upload == NNEDI3_UPLOAD_UBO) {
|
|
|
|
// Check features for uniform buffer objects.
|
2015-12-02 00:28:26 +00:00
|
|
|
if (!gl->BindBufferBase || !gl->GetUniformBlockIndex) {
|
|
|
|
MP_WARN(p, "Disabling NNEDI3 (%s required).\n",
|
|
|
|
gl->es ? "OpenGL ES 3.0" : "OpenGL 3.1");
|
2016-03-05 11:02:01 +00:00
|
|
|
p->opts.prescale_luma = 0;
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
}
|
|
|
|
} else if (p->opts.nnedi3_opts->upload == NNEDI3_UPLOAD_SHADER) {
|
|
|
|
// Check features for hard coding approach.
|
2015-12-02 00:28:26 +00:00
|
|
|
if ((!gl->es && gl->glsl_version < 330) ||
|
|
|
|
(gl->es && gl->glsl_version < 300))
|
|
|
|
{
|
|
|
|
MP_WARN(p, "Disabling NNEDI3 (%s required).\n",
|
|
|
|
gl->es ? "OpenGL ES 3.0" : "OpenGL 3.3");
|
2016-03-05 11:02:01 +00:00
|
|
|
p->opts.prescale_luma = 0;
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
static void init_gl(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
debug_check_gl(p, "before init_gl");
|
|
|
|
|
2015-12-19 10:56:19 +00:00
|
|
|
MP_VERBOSE(p, "Reported display depth: R=%d, G=%d, B=%d\n",
|
|
|
|
gl->fb_r, gl->fb_g, gl->fb_b);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->Disable(GL_DITHER);
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
gl_vao_init(&p->vao, gl, sizeof(struct vertex), vertex_vao);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2014-12-09 16:47:02 +00:00
|
|
|
gl_video_set_gl_state(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2014-12-24 15:54:47 +00:00
|
|
|
// Test whether we can use 10 bit. Hope that testing a single format/channel
|
|
|
|
// is good enough (instead of testing all 1-4 channels variants etc.).
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *fmt = gl_find_unorm_format(gl, 2, 1);
|
|
|
|
if (gl->GetTexLevelParameteriv && fmt) {
|
2014-12-24 15:54:47 +00:00
|
|
|
GLuint tex;
|
|
|
|
gl->GenTextures(1, &tex);
|
|
|
|
gl->BindTexture(GL_TEXTURE_2D, tex);
|
|
|
|
gl->TexImage2D(GL_TEXTURE_2D, 0, fmt->internal_format, 64, 64, 0,
|
|
|
|
fmt->format, fmt->type, NULL);
|
|
|
|
GLenum pname = 0;
|
|
|
|
switch (fmt->format) {
|
|
|
|
case GL_RED: pname = GL_TEXTURE_RED_SIZE; break;
|
|
|
|
case GL_LUMINANCE: pname = GL_TEXTURE_LUMINANCE_SIZE; break;
|
|
|
|
}
|
|
|
|
GLint param = 0;
|
|
|
|
if (pname)
|
|
|
|
gl->GetTexLevelParameteriv(GL_TEXTURE_2D, 0, pname, ¶m);
|
|
|
|
if (param) {
|
|
|
|
MP_VERBOSE(p, "16 bit texture depth: %d.\n", (int)param);
|
|
|
|
p->texture_16bit_depth = param;
|
|
|
|
}
|
2015-02-27 21:13:15 +00:00
|
|
|
gl->DeleteTextures(1, &tex);
|
2014-12-24 15:54:47 +00:00
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
debug_check_gl(p, "after init_gl");
|
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_uninit(struct gl_video *p)
|
|
|
|
{
|
2014-12-03 21:37:39 +00:00
|
|
|
if (!p)
|
|
|
|
return;
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
uninit_video(p);
|
|
|
|
|
vo_opengl: refactor shader generation (part 1)
The basic idea is to use dynamically generated shaders instead of a
single monolithic file + a ton of ifdefs. Instead of having to setup
every aspect of it separately (like compiling shaders, setting uniforms,
perfoming the actual rendering steps, the GLSL parts), we generate the
GLSL on the fly, and perform the rendering at the same time. The GLSL
is regenerated every frame, but the actual compiled OpenGL-level shaders
are cached, which makes it fast again. Almost all logic can be in a
single place.
The new code is significantly more flexible, which allows us to improve
the code clarity, performance and add more features easily.
This commit is incomplete. It drops almost all previous code, and
readds only the most important things (some of them actually buggy).
The next commit will complete it - it's separate to preserve authorship
information.
2015-03-12 20:57:54 +00:00
|
|
|
gl_sc_destroy(p->sc);
|
|
|
|
|
2015-01-28 21:22:29 +00:00
|
|
|
gl_vao_uninit(&p->vao);
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
gl->DeleteTextures(1, &p->lut_3d_texture);
|
|
|
|
|
|
|
|
mpgl_osd_destroy(p->osd);
|
|
|
|
|
2015-01-29 14:50:21 +00:00
|
|
|
gl_set_debug_logger(gl, NULL);
|
2014-12-23 01:46:44 +00:00
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
assign_options(&p->opts, &(struct gl_video_opts){0});
|
2013-03-01 20:19:20 +00:00
|
|
|
talloc_free(p);
|
|
|
|
}
|
|
|
|
|
2014-12-09 16:47:02 +00:00
|
|
|
void gl_video_set_gl_state(struct gl_video *p)
|
2016-04-22 10:08:21 +00:00
|
|
|
{
|
|
|
|
// This resets certain important state to defaults.
|
|
|
|
gl_video_unset_gl_state(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_unset_gl_state(struct gl_video *p)
|
2014-12-09 16:47:02 +00:00
|
|
|
{
|
|
|
|
GL *gl = p->gl;
|
|
|
|
|
|
|
|
gl->ActiveTexture(GL_TEXTURE0);
|
2015-01-22 17:54:05 +00:00
|
|
|
if (gl->mpgl_caps & MPGL_CAP_ROW_LENGTH)
|
2014-12-18 23:58:56 +00:00
|
|
|
gl->PixelStorei(GL_UNPACK_ROW_LENGTH, 0);
|
|
|
|
gl->PixelStorei(GL_UNPACK_ALIGNMENT, 4);
|
2014-12-09 16:47:02 +00:00
|
|
|
}
|
|
|
|
|
2014-11-23 19:06:05 +00:00
|
|
|
void gl_video_reset(struct gl_video *p)
|
|
|
|
{
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
gl_video_reset_surfaces(p);
|
2014-11-23 19:06:05 +00:00
|
|
|
}
|
|
|
|
|
2015-02-04 22:37:38 +00:00
|
|
|
bool gl_video_showing_interpolated_frame(struct gl_video *p)
|
|
|
|
{
|
|
|
|
return p->is_interpolated;
|
|
|
|
}
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
// dest = src.<w> (always using 4 components)
|
2016-05-12 18:08:49 +00:00
|
|
|
static void packed_fmt_swizzle(char w[5], const struct gl_format *texfmt,
|
2014-12-16 17:55:02 +00:00
|
|
|
const struct packed_fmt_entry *fmt)
|
2013-07-18 11:52:38 +00:00
|
|
|
{
|
2014-12-16 17:55:02 +00:00
|
|
|
const char *comp = "rgba";
|
|
|
|
|
2014-12-18 13:46:19 +00:00
|
|
|
// Normally, we work with GL_RG
|
2014-12-16 17:55:02 +00:00
|
|
|
if (texfmt && texfmt->internal_format == GL_LUMINANCE_ALPHA)
|
|
|
|
comp = "ragb";
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
for (int c = 0; c < 4; c++)
|
2014-12-16 17:55:02 +00:00
|
|
|
w[c] = comp[MPMAX(fmt->components[c] - 1, 0)];
|
2013-07-18 11:52:38 +00:00
|
|
|
w[4] = '\0';
|
|
|
|
}
|
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
// Like gl_find_unorm_format(), but takes bits (not bytes), and but if no fixed
|
|
|
|
// point format is available, return an unsigned integer format.
|
|
|
|
static const struct gl_format *find_plane_format(GL *gl, int bytes_per_comp,
|
2016-01-26 19:47:32 +00:00
|
|
|
int n_channels)
|
|
|
|
{
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *f = gl_find_unorm_format(gl, bytes_per_comp, n_channels);
|
|
|
|
if (f)
|
|
|
|
return f;
|
|
|
|
return gl_find_uint_format(gl, bytes_per_comp, n_channels);
|
2016-01-26 19:47:32 +00:00
|
|
|
}
|
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
static void init_image_desc(struct gl_video *p, int fmt)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
p->image_desc = mp_imgfmt_get_desc(fmt);
|
2014-12-16 17:55:02 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
p->plane_count = p->image_desc.num_planes;
|
|
|
|
p->is_yuv = p->image_desc.flags & MP_IMGFLAG_YUV;
|
|
|
|
p->has_alpha = p->image_desc.flags & MP_IMGFLAG_ALPHA;
|
|
|
|
p->use_integer_conversion = false;
|
|
|
|
p->color_swizzle[0] = '\0';
|
|
|
|
p->is_packed_yuv = fmt == IMGFMT_UYVY || fmt == IMGFMT_YUYV;
|
|
|
|
p->hwdec_active = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
// test_only=true checks if the format is supported
|
|
|
|
// test_only=false also initializes some rendering parameters accordingly
|
2016-05-10 16:49:49 +00:00
|
|
|
static bool init_format(struct gl_video *p, int fmt, bool test_only)
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
{
|
2016-05-10 16:49:49 +00:00
|
|
|
struct GL *gl = p->gl;
|
2013-11-03 23:00:18 +00:00
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct mp_imgfmt_desc desc = mp_imgfmt_get_desc(fmt);
|
|
|
|
if (!desc.id)
|
|
|
|
return false;
|
|
|
|
|
2013-03-28 19:48:53 +00:00
|
|
|
if (desc.num_planes > 4)
|
|
|
|
return false;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *plane_format[4] = {0};
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
char color_swizzle[5] = "";
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
// YUV/planar formats
|
2015-10-18 16:37:24 +00:00
|
|
|
if (desc.flags & (MP_IMGFLAG_YUV_P | MP_IMGFLAG_RGB_P)) {
|
2015-01-21 18:29:18 +00:00
|
|
|
int bits = desc.component_bits;
|
2013-03-28 19:48:53 +00:00
|
|
|
if ((desc.flags & MP_IMGFLAG_NE) && bits >= 8 && bits <= 16) {
|
2016-01-26 19:47:32 +00:00
|
|
|
plane_format[0] = find_plane_format(gl, (bits + 7) / 8, 1);
|
2016-05-10 16:49:49 +00:00
|
|
|
for (int n = 1; n < desc.num_planes; n++)
|
|
|
|
plane_format[n] = plane_format[0];
|
2015-10-18 16:37:24 +00:00
|
|
|
// RGB/planar
|
|
|
|
if (desc.flags & MP_IMGFLAG_RGB_P)
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
snprintf(color_swizzle, sizeof(color_swizzle), "brga");
|
2014-03-01 14:40:46 +00:00
|
|
|
goto supported;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-28 20:02:41 +00:00
|
|
|
// YUV/half-packed
|
2016-01-07 15:28:19 +00:00
|
|
|
if (desc.flags & MP_IMGFLAG_YUV_NV) {
|
|
|
|
int bits = desc.component_bits;
|
|
|
|
if ((desc.flags & MP_IMGFLAG_NE) && bits >= 8 && bits <= 16) {
|
2016-01-26 19:47:32 +00:00
|
|
|
plane_format[0] = find_plane_format(gl, (bits + 7) / 8, 1);
|
|
|
|
plane_format[1] = find_plane_format(gl, (bits + 7) / 8, 2);
|
2016-01-07 15:28:19 +00:00
|
|
|
if (desc.flags & MP_IMGFLAG_YUV_NV_SWAP)
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
snprintf(color_swizzle, sizeof(color_swizzle), "rbga");
|
2016-01-07 15:28:19 +00:00
|
|
|
goto supported;
|
|
|
|
}
|
2013-03-28 20:02:41 +00:00
|
|
|
}
|
|
|
|
|
2013-06-14 20:58:21 +00:00
|
|
|
// XYZ (same organization as RGB packed, but requires conversion matrix)
|
2014-03-01 14:40:46 +00:00
|
|
|
if (fmt == IMGFMT_XYZ12) {
|
2016-05-12 18:08:49 +00:00
|
|
|
plane_format[0] = gl_find_unorm_format(gl, 2, 3);
|
2014-03-01 14:40:46 +00:00
|
|
|
goto supported;
|
2013-05-01 21:59:00 +00:00
|
|
|
}
|
|
|
|
|
2013-07-18 11:52:38 +00:00
|
|
|
// Packed RGB(A) formats
|
2014-03-01 14:40:46 +00:00
|
|
|
for (const struct packed_fmt_entry *e = mp_packed_formats; e->fmt; e++) {
|
|
|
|
if (e->fmt == fmt) {
|
|
|
|
int n_comp = desc.bytes[0] / e->component_size;
|
2016-05-12 18:08:49 +00:00
|
|
|
plane_format[0] = gl_find_unorm_format(gl, e->component_size, n_comp);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
packed_fmt_swizzle(color_swizzle, plane_format[0], e);
|
2014-03-01 14:40:46 +00:00
|
|
|
goto supported;
|
2013-03-28 19:48:53 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-12 18:08:49 +00:00
|
|
|
// Special formats for which OpenGL happens to have direct support.
|
|
|
|
plane_format[0] = gl_find_special_format(gl, fmt);
|
|
|
|
if (plane_format[0]) {
|
|
|
|
// Packed YUV Apple formats color permutation
|
|
|
|
if (plane_format[0]->format == GL_RGB_422_APPLE)
|
|
|
|
snprintf(color_swizzle, sizeof(color_swizzle), "gbra");
|
|
|
|
goto supported;
|
2013-11-13 20:52:34 +00:00
|
|
|
}
|
|
|
|
|
2014-03-01 14:40:46 +00:00
|
|
|
// Unsupported format
|
|
|
|
return false;
|
|
|
|
|
|
|
|
supported:
|
2013-07-18 11:52:38 +00:00
|
|
|
|
2015-01-29 14:50:21 +00:00
|
|
|
if (desc.component_bits > 8 && desc.component_bits < 16) {
|
2016-05-10 16:49:49 +00:00
|
|
|
if (p->texture_16bit_depth < 16)
|
2014-12-24 15:54:47 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-01-26 19:47:32 +00:00
|
|
|
int use_integer = -1;
|
2016-05-10 16:49:49 +00:00
|
|
|
for (int n = 0; n < desc.num_planes; n++) {
|
2016-05-12 18:08:49 +00:00
|
|
|
if (!plane_format[n])
|
2014-12-17 20:48:23 +00:00
|
|
|
return false;
|
2016-05-12 18:08:49 +00:00
|
|
|
int use_int_plane = !!gl_integer_format_to_base(plane_format[n]->format);
|
2016-01-26 19:47:32 +00:00
|
|
|
if (use_integer < 0)
|
|
|
|
use_integer = use_int_plane;
|
|
|
|
if (use_integer != use_int_plane)
|
|
|
|
return false; // mixed planes not supported
|
2014-12-17 20:48:23 +00:00
|
|
|
}
|
2016-01-26 19:47:32 +00:00
|
|
|
|
2016-05-10 16:49:49 +00:00
|
|
|
if (use_integer && p->forced_dumb_mode)
|
2016-01-26 19:47:32 +00:00
|
|
|
return false;
|
2014-12-17 20:48:23 +00:00
|
|
|
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (!test_only) {
|
2016-05-10 16:49:49 +00:00
|
|
|
for (int n = 0; n < desc.num_planes; n++) {
|
|
|
|
struct texplane *plane = &p->image.planes[n];
|
2016-05-12 18:08:49 +00:00
|
|
|
const struct gl_format *format = plane_format[n];
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
assert(format);
|
|
|
|
plane->gl_format = format->format;
|
|
|
|
plane->gl_internal_format = format->internal_format;
|
|
|
|
plane->gl_type = format->type;
|
|
|
|
plane->use_integer = use_integer;
|
|
|
|
if (plane->gl_format == GL_LUMINANCE_ALPHA)
|
|
|
|
snprintf(plane->swizzle, sizeof(plane->swizzle), "raaa");
|
|
|
|
}
|
2013-07-18 11:52:38 +00:00
|
|
|
|
2016-05-10 16:49:49 +00:00
|
|
|
init_image_desc(p, fmt);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
|
2016-05-10 16:49:49 +00:00
|
|
|
p->use_integer_conversion = use_integer;
|
|
|
|
snprintf(p->color_swizzle, sizeof(p->color_swizzle), "%s", color_swizzle);
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
bool gl_video_check_format(struct gl_video *p, int mp_format)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
if (init_format(p, mp_format, true))
|
|
|
|
return true;
|
|
|
|
if (p->hwdec && p->hwdec->driver->imgfmt == mp_format)
|
|
|
|
return true;
|
|
|
|
return false;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2013-06-07 23:35:44 +00:00
|
|
|
void gl_video_config(struct gl_video *p, struct mp_image_params *params)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-22 17:29:37 +00:00
|
|
|
mp_image_unrefp(&p->image.mpi);
|
2013-11-05 18:08:44 +00:00
|
|
|
|
2015-01-29 18:53:49 +00:00
|
|
|
if (!mp_image_params_equal(&p->real_image_params, params)) {
|
2013-11-05 18:08:44 +00:00
|
|
|
uninit_video(p);
|
2015-01-29 18:53:49 +00:00
|
|
|
p->real_image_params = *params;
|
|
|
|
p->image_params = *params;
|
2014-12-09 20:36:45 +00:00
|
|
|
if (params->imgfmt)
|
2015-01-29 18:53:49 +00:00
|
|
|
init_video(p);
|
2013-11-05 18:08:44 +00:00
|
|
|
}
|
2014-11-07 14:28:12 +00:00
|
|
|
|
vo_opengl: refactor shader generation (part 2)
This adds stuff related to gamma, linear light, sigmoid, BT.2020-CL,
etc, as well as color management. Also adds a new gamma function (gamma22).
This adds new parameters to configure the CMS settings, in particular
letting us target simple colorspaces without requiring usage of a 3DLUT.
This adds smoothmotion. Mostly working, but it's still sensitive to
timing issues. It's based on an actual queue now, but the queue size
is kept small to avoid larger amounts of latency.
Also makes “upscale before blending” the default strategy.
This is justified because the "render after blending" thing doesn't seme
to work consistently any way (introduces stutter due to the way vsync
timing works, or something), so this behavior is a bit closer to master
and makes pausing/unpausing less weird/jumpy.
This adds the remaining scalers, including bicubic_fast, sharpen3,
sharpen5, polar filters and antiringing. Apparently, sharpen3/5 also
consult scale-param1, which was undocumented in master.
This also implements cropping and chroma transformation, plus
rotation/flipping. These are inherently part of the same logic, although
it's a bit rough around the edges in some case, mainly due to the fallback
code paths (for bilinear scaling without indirection).
2015-03-12 21:18:16 +00:00
|
|
|
gl_video_reset_surfaces(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-03-23 15:32:59 +00:00
|
|
|
void gl_video_set_osd_source(struct gl_video *p, struct osd_state *osd)
|
|
|
|
{
|
|
|
|
mpgl_osd_destroy(p->osd);
|
|
|
|
p->osd = NULL;
|
|
|
|
p->osd_state = osd;
|
|
|
|
recreate_osd(p);
|
|
|
|
}
|
|
|
|
|
2016-02-13 14:33:00 +00:00
|
|
|
struct gl_video *gl_video_init(GL *gl, struct mp_log *log, struct mpv_global *g,
|
|
|
|
struct gl_lcms *cms)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-21 19:32:42 +00:00
|
|
|
if (gl->version < 210 && gl->es < 200) {
|
2014-12-22 11:49:20 +00:00
|
|
|
mp_err(log, "At least OpenGL 2.1 or OpenGL ES 2.0 required.\n");
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2013-03-01 20:19:20 +00:00
|
|
|
struct gl_video *p = talloc_ptrtype(NULL, p);
|
|
|
|
*p = (struct gl_video) {
|
|
|
|
.gl = gl,
|
2015-03-27 12:27:40 +00:00
|
|
|
.global = g,
|
2013-07-31 19:44:21 +00:00
|
|
|
.log = log,
|
2016-02-13 14:33:00 +00:00
|
|
|
.cms = cms,
|
2013-03-01 20:19:20 +00:00
|
|
|
.opts = gl_video_opts_def,
|
2014-12-24 15:54:47 +00:00
|
|
|
.texture_16bit_depth = 16,
|
2015-09-23 20:13:03 +00:00
|
|
|
.sc = gl_sc_create(gl, log),
|
2013-03-01 20:19:20 +00:00
|
|
|
};
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++)
|
|
|
|
p->scaler[n] = (struct scaler){.index = n};
|
2014-12-23 01:46:44 +00:00
|
|
|
gl_video_set_debug(p, true);
|
vo_opengl: restore single pass optimization as separate code path
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
2015-09-07 19:09:06 +00:00
|
|
|
init_gl(p);
|
2013-03-01 20:19:20 +00:00
|
|
|
recreate_osd(p);
|
|
|
|
return p;
|
|
|
|
}
|
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Get static string for scaler shader. If "tscale" is set to true, the
|
|
|
|
// scaler must be a separable convolution filter.
|
|
|
|
static const char *handle_scaler_opt(const char *name, bool tscale)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-26 01:03:44 +00:00
|
|
|
if (name && name[0]) {
|
2013-03-01 20:19:20 +00:00
|
|
|
const struct filter_kernel *kernel = mp_find_filter_kernel(name);
|
2015-03-13 18:30:31 +00:00
|
|
|
if (kernel && (!tscale || !kernel->polar))
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
return kernel->f.name;
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-03-15 06:11:51 +00:00
|
|
|
for (const char *const *filter = tscale ? fixed_tscale_filters
|
|
|
|
: fixed_scale_filters;
|
|
|
|
*filter; filter++) {
|
|
|
|
if (strcmp(*filter, name) == 0)
|
2013-03-01 20:19:20 +00:00
|
|
|
return *filter;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2015-06-09 20:30:32 +00:00
|
|
|
static char **dup_str_array(void *parent, char **src)
|
|
|
|
{
|
|
|
|
if (!src)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
char **res = talloc_new(parent);
|
|
|
|
int num = 0;
|
|
|
|
for (int n = 0; src && src[n]; n++)
|
|
|
|
MP_TARRAY_APPEND(res, res, num, talloc_strdup(res, src[n]));
|
|
|
|
MP_TARRAY_APPEND(res, res, num, NULL);
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
static void assign_options(struct gl_video_opts *dst, struct gl_video_opts *src)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-09-08 20:46:36 +00:00
|
|
|
talloc_free(dst->scale_shader);
|
|
|
|
talloc_free(dst->pre_shaders);
|
|
|
|
talloc_free(dst->post_shaders);
|
2015-09-09 18:40:04 +00:00
|
|
|
talloc_free(dst->deband_opts);
|
2015-10-26 22:43:48 +00:00
|
|
|
talloc_free(dst->superxbr_opts);
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
talloc_free(dst->nnedi3_opts);
|
2015-06-09 20:30:32 +00:00
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
*dst = *src;
|
2015-06-09 20:30:32 +00:00
|
|
|
|
2015-09-09 18:40:04 +00:00
|
|
|
if (src->deband_opts)
|
|
|
|
dst->deband_opts = m_sub_options_copy(NULL, &deband_conf, src->deband_opts);
|
|
|
|
|
2015-10-26 22:43:48 +00:00
|
|
|
if (src->superxbr_opts) {
|
|
|
|
dst->superxbr_opts = m_sub_options_copy(NULL, &superxbr_conf,
|
|
|
|
src->superxbr_opts);
|
|
|
|
}
|
|
|
|
|
vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes #2230
2015-10-28 01:37:55 +00:00
|
|
|
if (src->nnedi3_opts) {
|
|
|
|
dst->nnedi3_opts = m_sub_options_copy(NULL, &nnedi3_conf,
|
|
|
|
src->nnedi3_opts);
|
|
|
|
}
|
|
|
|
|
2016-03-05 08:42:57 +00:00
|
|
|
for (int n = 0; n < SCALER_COUNT; n++) {
|
2015-09-08 20:46:36 +00:00
|
|
|
dst->scaler[n].kernel.name =
|
2016-03-05 08:42:57 +00:00
|
|
|
(char *)handle_scaler_opt(dst->scaler[n].kernel.name,
|
|
|
|
n == SCALER_TSCALE);
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
}
|
2015-03-13 18:30:31 +00:00
|
|
|
|
2015-09-08 20:46:36 +00:00
|
|
|
dst->scale_shader = talloc_strdup(NULL, dst->scale_shader);
|
|
|
|
dst->pre_shaders = dup_str_array(NULL, dst->pre_shaders);
|
|
|
|
dst->post_shaders = dup_str_array(NULL, dst->post_shaders);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Set the options, and possibly update the filter chain too.
|
|
|
|
// Note: assumes all options are valid and verified by the option parser.
|
|
|
|
void gl_video_set_options(struct gl_video *p, struct gl_video_opts *opts)
|
|
|
|
{
|
|
|
|
assign_options(&p->opts, opts);
|
2015-07-16 20:43:40 +00:00
|
|
|
|
|
|
|
check_gl_features(p);
|
|
|
|
uninit_rendering(p);
|
2015-11-28 18:59:11 +00:00
|
|
|
|
|
|
|
if (p->opts.interpolation && !p->global->opts->video_sync && !p->dsi_warned) {
|
|
|
|
MP_WARN(p, "Interpolation now requires enabling display-sync mode.\n"
|
|
|
|
"E.g.: --video-sync=display-resample\n");
|
|
|
|
p->dsi_warned = true;
|
|
|
|
}
|
2015-07-16 20:43:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_configure_queue(struct gl_video *p, struct vo *vo)
|
|
|
|
{
|
2015-07-20 19:12:46 +00:00
|
|
|
int queue_size = 1;
|
2015-07-16 20:43:40 +00:00
|
|
|
|
2015-03-13 18:30:31 +00:00
|
|
|
// Figure out an adequate size for the interpolation queue. The larger
|
2015-06-26 08:59:57 +00:00
|
|
|
// the radius, the earlier we need to queue frames.
|
2015-07-16 20:43:40 +00:00
|
|
|
if (p->opts.interpolation) {
|
vo_opengl: refactor scaler configuration
This merges all of the scaler-related options into a single
configuration struct, and also cleans up the way they're passed through
the code. (For example, the scaler index is no longer threaded through
pass_sample, just the scaler configuration itself, and there's no longer
duplication of the params etc.)
In addition, this commit makes scale-down more principled, and turns it
into a scaler in its own right - so there's no longer an ugly separation
between scale and scale-down in the code.
Finally, the radius stuff has been made more proper - filters always
have a radius now (there's no more radius -1), and get a new .resizable
attribute instead for when it's tunable.
User-visible changes:
1. scale-down has been renamed dscale and now has its own set of config
options (dscale-param1, dscale-radius) etc., instead of reusing
scale-param1 (which was arguably a bug).
2. The default radius is no longer fixed at 3, but instead uses that
filter's preferred radius by default. (Scalers with a default radius
other than 3 include sinc, gaussian, box and triangle)
3. scale-radius etc. now goes down to 0.5, rather than 1.0. 0.5 is the
smallest radius that theoretically makes sense, and indeed it's used
by at least one filter (nearest).
Apart from that, it should just be internal changes only.
Note that this sets up for the refactor discussed in #1720, which would
be to merge scaler and window configurations (include parameters etc.)
into a single, simplified string. In the code, this would now basically
just mean getting rid of all the OPT_FLOATRANGE etc. lines related to
scalers and replacing them by a single function that parses a string and
updates the struct scaler_config as appropriate.
2015-03-26 00:55:32 +00:00
|
|
|
const struct filter_kernel *kernel =
|
2016-03-05 08:42:57 +00:00
|
|
|
mp_find_filter_kernel(p->opts.scaler[SCALER_TSCALE].kernel.name);
|
2015-03-13 18:30:31 +00:00
|
|
|
if (kernel) {
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
double radius = kernel->f.radius;
|
2016-03-05 08:42:57 +00:00
|
|
|
radius = radius > 0 ? radius : p->opts.scaler[SCALER_TSCALE].radius;
|
2015-07-20 19:12:46 +00:00
|
|
|
queue_size += 1 + ceil(radius);
|
2015-07-11 11:55:45 +00:00
|
|
|
} else {
|
|
|
|
// Oversample case
|
2015-07-20 19:12:46 +00:00
|
|
|
queue_size += 2;
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
|
|
|
}
|
2013-03-01 20:19:20 +00:00
|
|
|
|
2015-11-25 21:10:55 +00:00
|
|
|
vo_set_queue_params(vo, 0, queue_size);
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-01-06 16:34:29 +00:00
|
|
|
struct mp_csp_equalizer *gl_video_eq_ptr(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-06 16:34:29 +00:00
|
|
|
return &p->video_eq;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
|
|
|
|
2015-01-06 16:34:29 +00:00
|
|
|
// Call when the mp_csp_equalizer returned by gl_video_eq_ptr() was changed.
|
|
|
|
void gl_video_eq_update(struct gl_video *p)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-12-21 19:03:36 +00:00
|
|
|
static int validate_scaler_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param)
|
2013-03-01 20:19:20 +00:00
|
|
|
{
|
2015-01-22 18:58:22 +00:00
|
|
|
char s[20] = {0};
|
|
|
|
int r = 1;
|
2015-03-13 18:30:31 +00:00
|
|
|
bool tscale = bstr_equals0(name, "tscale");
|
2013-07-22 00:14:15 +00:00
|
|
|
if (bstr_equals0(param, "help")) {
|
2015-01-22 18:58:22 +00:00
|
|
|
r = M_OPT_EXIT - 1;
|
|
|
|
} else {
|
|
|
|
snprintf(s, sizeof(s), "%.*s", BSTR_P(param));
|
2015-03-13 18:30:31 +00:00
|
|
|
if (!handle_scaler_opt(s, tscale))
|
2015-01-22 18:58:22 +00:00
|
|
|
r = M_OPT_INVALID;
|
|
|
|
}
|
|
|
|
if (r < 1) {
|
2013-12-21 19:03:36 +00:00
|
|
|
mp_info(log, "Available scalers:\n");
|
2015-03-15 06:11:51 +00:00
|
|
|
for (const char *const *filter = tscale ? fixed_tscale_filters
|
|
|
|
: fixed_scale_filters;
|
|
|
|
*filter; filter++) {
|
|
|
|
mp_info(log, " %s\n", *filter);
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
for (int n = 0; mp_filter_kernels[n].f.name; n++) {
|
2015-03-13 18:30:31 +00:00
|
|
|
if (!tscale || !mp_filter_kernels[n].polar)
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
mp_info(log, " %s\n", mp_filter_kernels[n].f.name);
|
2015-03-13 18:30:31 +00:00
|
|
|
}
|
2015-01-22 18:58:22 +00:00
|
|
|
if (s[0])
|
|
|
|
mp_fatal(log, "No scaler named '%s' found!\n", s);
|
2013-07-22 00:14:15 +00:00
|
|
|
}
|
2015-01-22 18:58:22 +00:00
|
|
|
return r;
|
2013-03-01 20:19:20 +00:00
|
|
|
}
|
2013-03-15 19:17:33 +00:00
|
|
|
|
vo_opengl: separate kernel and window
This makes the core much more elegant, reusable, reconfigurable and also
allows us to more easily add aliases for specific configurations.
Furthermore, this lets us apply a generic blur factor / window function
to arbitrary filters, so we can finally "mix and match" in order to
fine-tune windowing functions.
A few notes are in order:
1. The current system for configuring scalers is ugly and rapidly
getting unwieldy. I modified the man page to make it a bit more
bearable, but long-term we have to do something about it; especially
since..
2. There's currently no way to affect the blur factor or parameters of
the window functions themselves. For example, I can't actually
fine-tune the kaiser window's param1, since there's simply no way to
do so in the current API - even though filter_kernels.c supports it
just fine!
3. This removes some lesser used filters (especially those which are
purely window functions to begin with). If anybody asks, you can get
eg. the old behavior of scale=hanning by using
scale=box:scale-window=hanning:scale-radius=1 (and yes, the result is
just as terrible as that sounds - which is why nobody should have
been using them in the first place).
4. This changes the semantics of the "triangle" scaler slightly - it now
has an arbitrary radius. This can possibly produce weird results for
people who were previously using scale-down=triangle, especially if
in combination with scale-radius (for the usual upscaling). The
correct fix for this is to use scale-down=bilinear_slow instead,
which is an alias for triangle at radius 1.
In regards to the last point, in future I want to make it so that
filters have a filter-specific "preferred radius" (for the ones that
are arbitrarily tunable), once the configuration system for filters has
been redesigned (in particular in a way that will let us separate scale
and scale-down cleanly). That way, "triangle" can simply have the
preferred radius of 1 by default, while still being tunable. (Rather
than the default radius being hard-coded to 3 always)
2015-03-25 03:40:28 +00:00
|
|
|
static int validate_window_opt(struct mp_log *log, const m_option_t *opt,
|
|
|
|
struct bstr name, struct bstr param)
|
|
|
|
{
|
|
|
|
char s[20] = {0};
|
|
|
|
int r = 1;
|
|
|
|
if (bstr_equals0(param, "help")) {
|
|
|
|
r = M_OPT_EXIT - 1;
|
|
|
|
} else {
|
|
|
|
snprintf(s, sizeof(s), "%.*s", BSTR_P(param));
|
|
|
|
const struct filter_window *window = mp_find_filter_window(s);
|
|
|
|
if (!window)
|
|
|
|
r = M_OPT_INVALID;
|
|
|
|
}
|
|
|
|
if (r < 1) {
|
|
|
|
mp_info(log, "Available windows:\n");
|
|
|
|
for (int n = 0; mp_filter_windows[n].name; n++)
|
|
|
|
mp_info(log, " %s\n", mp_filter_windows[n].name);
|
|
|
|
if (s[0])
|
|
|
|
mp_fatal(log, "No window named '%s' found!\n", s);
|
|
|
|
}
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2015-02-07 12:54:18 +00:00
|
|
|
float gl_video_scale_ambient_lux(float lmin, float lmax,
|
|
|
|
float rmin, float rmax, float lux)
|
|
|
|
{
|
|
|
|
assert(lmax > lmin);
|
|
|
|
|
|
|
|
float num = (rmax - rmin) * (log10(lux) - log10(lmin));
|
|
|
|
float den = log10(lmax) - log10(lmin);
|
|
|
|
float result = num / den + rmin;
|
|
|
|
|
|
|
|
// clamp the result
|
|
|
|
float max = MPMAX(rmax, rmin);
|
|
|
|
float min = MPMIN(rmax, rmin);
|
|
|
|
return MPMAX(MPMIN(result, max), min);
|
|
|
|
}
|
|
|
|
|
|
|
|
void gl_video_set_ambient_lux(struct gl_video *p, int lux)
|
|
|
|
{
|
|
|
|
if (p->opts.gamma_auto) {
|
|
|
|
float gamma = gl_video_scale_ambient_lux(16.0, 64.0, 2.40, 1.961, lux);
|
2015-03-04 19:18:14 +00:00
|
|
|
MP_VERBOSE(p, "ambient light changed: %dlux (gamma: %f)\n", lux, gamma);
|
2015-02-07 12:54:18 +00:00
|
|
|
p->opts.gamma = MPMIN(1.0, 1.961 / gamma);
|
|
|
|
gl_video_eq_update(p);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-11-03 23:00:18 +00:00
|
|
|
void gl_video_set_hwdec(struct gl_video *p, struct gl_hwdec *hwdec)
|
|
|
|
{
|
|
|
|
p->hwdec = hwdec;
|
vo_opengl: refactor how hwdec interop exports textures
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
2016-05-10 16:29:10 +00:00
|
|
|
unref_current_image(p);
|
2013-11-03 23:00:18 +00:00
|
|
|
}
|