Theoretically possible (and quite unlikely due to the small texture
size). The code was originally written with the assumption that texture
allocations can't fail, and it was never updated out of laziness.
Untested.
As we are less and less interested in vpdpau, with nvdec and vaapi
being better choices in general on nvidia and AMD respectively, we
might consider removing direct_mode, where we bypass the vdpau
mixer and work directly with yuv textures. Normally, working with
yuv textures would be great, but vdpau built in an assumption that
all frames are delivered as separate fields, causing us to have
to re-interleave them.
nvidia then introduces a new OpenGL extension that can return the
yuv frames as frames, but we can't just unconditionally switch to
that as we'd want to keep supporting older hardware where the drivers
are no longer getting new features. The end result is that we
wouldn't be able to get rid of the old code paths.
Removing direct_mode means we always use the mixer, and work with
rgba frame textures. There are some theoretical limitations to
this, but in practice they probably don't matter much - unsupported
colourspaces don't matter because without 10bit decoding support,
we can't use them anyway, and apparently we're not doing separate
chroma scaling these days, so scaling the rbga doesn't really lose
anything (and the vdpau hq scaling option remains available).
Pretty annoying affair. The vo_gpu code could of course not trigger
rendering from filters yet, so it needed to be extended. Also, this uses
some icky stuff made for vf_sub (and this was the reason I marked vf_sub
as deprecated), so everything is terrible.
Handling the window with this function makes no sense, since windows
and kernels are not the same thing and don't share the same option list.
The only reason it's done is to make sure the char* points at the static
string rather than the dynamically allocated one, which we can do
manually in this function. Rewrite a bit for clarity/quality.
pass_color_map() (in video_shaders.c) and pass_colormanage() (video.c)
both duplicate the condition on whether to do peak computation. Peak
computation requires a compute shader, so if the duplicated conditions
don't match, video_shaders.c will generate a compute shader, but video.c
will try to run it as fragment shader. This leads to a "blue screen".
This can be reproduced by playing a HDTV video with --target-peak=99.
It's not clear how to fix this. Should pass_tone_map() be only invoked
if mp_trc_is_hdr() == true (what pass_colormanage() uses to decide
whether to enable peak computation), or should pass_colormanage() just
tell pass_color_map() to skip peak computation? Decide for the latter,
as it's more robust.
Even if not correct, at least it gets rid of the blue shit.
Fixes: #7149
Probably. It's not like these pixel formats are formally specified -
FFmpeg added them because _some_ file format or decoder supports it, and
while that format/codec may define it precisely, the pixel format is
sort of disconnected and just a FFmpeg thing.
In any case, the yuva sample I had at hand uses the full range the
component data type can provide. The old code used the same "shifted"
range as for Y/U/V components, which must have been wrong.
This will not work correctly for packed YUVA formats, but fortunately
they matter even less.
For some reason, the first frame displayed on X11 with amdgpu and OpenGL
will be garbled. This is especially visible if the player starts,
displays a frame, but then still takes a while to properly start
playback.
With --interpolation, the behavior somehow changes (usually gets worse).
I'm not sure what exactly is going on, and the code in video.c is way
too abstruse. Maybe there is some slight possibility that a frame with
uncleared contents gets displayed, which somehow also corrupts another
frame that is displayed immediately after that.
If clear is unconditionally run, this somehow doesn't happen, and you
see a video frame. By any logic this shouldn't happen: a video frame
should always overwrite the background. So I can't exclude that this
isn't some sort of driver bug, or at least very obscure interaction.
Clearing should be practically free anyway, so always do it.
Fixes: #7105
This lets us set primaries, transfer function and the target peak
based on what the presenting layer would want us to have.
Now that this mechanism is available, warn if the user has
overridden values such as primaries or transfer function.
Using e.g. --vf=format=0bgr showed obviously wrong colors with --vo=gpu.
The reason is that leading padding wasn't handled correctly.
Try to hack fix it. While the code in copy_image() is somewhat
reasonable, I can't tell what the fuck is going on with that HOOKED
shit. For some reason this HOOKED shit doesn't use copy_image() (???),
or uses it incorrectly. It affects debanding. --deband=no works
correctly. If it's enabled, the crap in hook_prelude() is needed.
I bet there are many more bugs with this. For example, the deband shader
will try to deband the alpha channel if the format abgr is used (because
the correct component order is only established later). This can be
tested by inserting a "color.x = 0;" at the end of the deband shader,
and using --vf=format=rgba vs. abgr.
I cannot comprehend why it doesn't just store explicitly which
components a texture contains, and why it doesn't just read the
components always in an uniform way.
There's a big chance this fix works only by coincidence. This shouldn't
have been so hard either. Time for a complete rewrite?
Internally, vo_gpu uses NaN for some options to indicate a default value
that is different depending on the context (e.g. different scalers).
There are 2 problems with this:
1. you couldn't reset the options to their defaults
2. NaN is a damn mess and shouldn't be part of the API
The option parser already rejected NaN explicitly, which is why 1.
didn't work. Regarding 2., JSON might be a good example, and actually
caused a bug report.
Fix this by mapping NaN to the special value "default". I think I'd
prefer other mechanisms (maybe just having every scaler expose separate
options?), but for now this will do. See you in a future commit, which
painfully deprecates this and replaces it with something else.
I refrained from using "no" (my favorite magic value for "unset" etc.)
because then I'd have e.g. make --no-scale-param1 work, which in
addition to a lot of effort looks dumb and nobody will use it.
Here's also an apology for the shitty added test script.
Fixes: #6691
In some cases, src.sig_peak remains undefined as 0, which was definitely
the case when using the OSD, since it never got passed through the usual
color space normalization process. Most robust work-around is to simply
force the normalization at the site where it's needed. This ensures this
value is always valid and defined, to make the peak-dependent logic in
these two functions always work.
Fixes 4b25ec3a9dFixes#6917Fixes#6918
error diffusion requires two texture rendering pass. The existing code
reuses `screen_tex` and creates another for such purpose. This works
generally well for opengl, but could potentially be problematic for
vulkan, due to its async natural.
This is a straightforward parallel implementation of error diffusion
algorithms in compute shader. Basically we use single work group with
maximal possible size to process the whole image. After a shift
mapping we are able to process all pixels column by column.
A large ring buffer are allocated in shared memory to speed things up.
However the size of required shared memory depends linearly on the
height of video window (or screen height in fullscreen mode). In case
there is no enough shared memory, it will fallback to `--dither=fruit`.
The maximal allowed work group size is hardcoded as 1024. Ideally we
could query `GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS`. But for whatever
reason, it seems most high end card from nvidia and amd support only
the minimal required value, so I guess we can stick to it for now.
The calculation of scale factor involves 32-bit float, and a strict
equality test will effectively ignore `--scaler-resizes-only` option
for some non-integer scale factor.
Fix this by using non-strict equality check.
The old size limit was chosen before LUT texture was supported in user
shader. At that time, the whole user shader will be compiled and run
on GPU, which makes large user shader impractical to be used.
With the introduction of LUT texture, the old size limit doesn't make
any sense. For example, a 1024x1024 rgba16f LUT will cost 32MB shader
size.
Fix this by increasing the size limit to a value that's unlikely be
reached.
Manual changes done:
* Merged the interface-changes under the already master'd changes.
* Moved the hwdec-related option changes to video/decode/vd_lavc.c.
Before this commit, texture offset is set after all source textures
are finalized. Which means CHROMA hooks won't be able to align with
luma planes. This could be problematic for chroma prescalers utilizing
information from luma plane.
Fix this by find the reference texture early, and set global texture
offset early.
This solves some edge cases when using files with very weird metadata
(e.g. MaxCLL 10k and so forth). Instead of just blindly seeding it with
the tagged metadata, forcibly set the initial state from the detected
values.
Rather than the linear cd/m^2 units, these (relative) logarithmic units
lend themselves much better to actually detecting scene changes,
especially since the scene averaging was changed to also work
logarithmically.
This change switches to a logarithmic mean to estimate the average
signal brightness. This handles dark scenes with isolated highlights
much more faithfully than the linear mean did, since the log of the
signal roughly corresponds to the perceptual brightness.
In theory our "eye adaptation" algorithm works in both ways, both
darkening bright scenes and brightening dark scenes. But I've always
just prevented the latter with a hard clamp, since I wanted to avoid
blowing up dark scenes into looking funny (and full of noise).
But allowing a tiny bit of over-exposure might be a good thing. I won't
change the default just yet (better let users test), but a moderate
value of 1.2 might be better than the current 1.0 limit. Needs testing
especially on dark scenes.
The previous approach of using an FIR with tunable hard threshold for
scene changes had several problems:
- the FIR involved annoying hard-coded buffer sizes, high VRAM usage,
and the FIR sum was prone to numerical overflow which limited the
number of frames we could average over. We also totally redesign the
scene change detection.
- the hard scene change detection was prone to both false positives and
false negatives, each with their own (annoying) issues.
Scrap this entirely and switch to a dual approach of using a simple
single-pole IIR low pass filter to smooth out noise, while using a
softer scene change curve (with tunable low and high thresholds), based
on `smoothstep`. The IIR filter is extremely simple in its
implementation and has an arbitrarily user-tunable cutoff frequency,
while the smoothstep-based scene change curve provides a good, tunable
tradeoff between adaptation speed and stability - without exhibiting
either of the traditional issues associated with the hard cutoff.
Another way to think about the new options is that the "low threshold"
provides a margin of error within which we don't care about small
fluctuations in the scene (which will therefore be smoothed out by the
IIR filter).
Instead of desaturating towards luma, we desaturate towards the
per-channel tone mapped version. This essentially proves a smooth
roll-off towards the "hollywood"-style (non-chromatic) tone mapping
algorithm, which works better for bright content, while continuing to
use the "linear" style (chromatic) tone mapping algorithm for primarily
in-gamut content.
We also split up the desaturation algorithm into strength and exponent,
which allows users to use less aggressive desaturation settings without
affecting the overall curve.
Add "auto" the possible values of target-peak. The default value
for target_peak is to calculate the target using mp_trc_nom_peak.
Unfortunately, this default was outside the acceptable range of
10-10000 nits, which prevented its later reassignment. So add an
"auto" choice to target-peak which lets clients and scripts go back
to using the trc default after assigning a value.
Since linear downscaling makes sense to handle independently from
linear/sigmoid upscaling, we split this option up. Now,
linear-downscaling is its own option that only controls linearization
when downscaling and nothing more. Likewise, linear-upscaling /
sigmoid-upscaling are two mutually exclusive options (the latter
overriding the former) that apply only to upscaling and no longer
implicitly enable linear light downscaling as well.
The old behavior was very confusing, as evidenced by issues such
as #6213. The current behavior should make much more sense, and only
minimally breaks backwards compatibility (since using linear-scaling
directly was very uncommon - most users got this for free as part of
gpu-hq and relied only on that).
Closes#6213.
When using multiple compute shaders as part of the same pass, there can
be a conflict in the block sizes. In the problematic case, the HDR
detection shader can collide with the polar sampling shader. In this
case, the solution is clear - the passes that can handle any size should
"give in" and not overwrite the block sizes.
Fixes#6083.
This was always a legacy thing. Remove it by applying an orgy of
mp_get_config_group() calls, and sometimes m_config_cache_alloc() or
mp_read_option_raw().
win32 changes untested.
Define a hard-coded value for gl_NumWorkGroups if it is not available.
This adds an additional requirement of needing a shader recompile for
all window size changes.
This was considered a worthwhile compromise as currently f.ex. d3d11
completely lacked any peak computation - this is a major quality of
life upgrade.
Also rename stereo3d to stereo_in. The only real change is that the
vo_gpu OSD code now uses the actual stereo 3D mode, instead of the
--video-steroe-mode value. (Why does this vo_gpu code even exist?)
This passed the display size as source size to the renderer, which is of
course nonsense. I don't know what I was doing in 569383bc54.
Yet another fix for those damn anamorphic videos.
As a somewhat redundant/cosmetic change, use image_params instead of
real_image_params in the code above. They should have the same, dimensions
(but possibly different formats when doing hw decdoing), and mixing them
is confusing. p->image_params wins because it's shorter.
Actually fixes#5619.
We took the storage size instead of the display size for "unscaled"
screenshots. Even if it's called "unscaled", it's still supposed to
scale to compensate for aspect ratio.
(How many commits fixing anamorphic screenshots in various situations
are there?)
Fixes#5619.
This solves a number of problems simultaneously:
1. When outputting HLG, this allows tuning the OOTF based on the display
characteristics.
2. When outputting PQ or other HDR curves, this allows soft-limiting the
output brightness using the tone mapping algorithm.
3. When outputting SDR, this allows HDR-in-SDR style output, by
controlling the output brightness directly.
Closes#5521
The primary need for this change is the fact that the OOTF was
incorrectly scaled, due to the fact that the application of the OOTF can
itself change the required normalization peak. (Plus, an oversight in
pass_inverse_ootf meant we forgot to normalize at the end of it)
The linearize/delinearize functions still normalize the scale since it's
used in a number of places throughout gpu/video.c, but the color
management function now converts to absolute scale right away, instead
of in an awkward way inside the tone mapping branch. The OOTF functions
now work in absolute scale only.
In addition, minor changes have been made to the way normalization is
handled for tone mapping - we now divide out the dst_peak *after* peak
detection, in order to make the scale of the peak detection buffer
consistent even if the dst_peak were to (hypothetically) change
mid-stream. In theory, we could also do this for desaturation, but doing
the desaturation before tone mapping has the advantage of preserving
much more brightness than the other way around - and even mid-stream
changes are not that drastic here.
Finally, some preparation work has been done for allowing the user to
customize the `dst.sig_peak` in the future.
Similar spirit to edb4970ca8. check_gl_features() has a confusing
early-return. This also adds compute_hdr_peak to the list of options
that is copied to the dumb-mode options struct, since it seems to make a
difference. Otherwise it would be impossible to disable HDR peak
detection in dumb mode.