Missed during the recent changes.
Also simplify error checking code and check for POLLNVAL
as well (the display fd was never actually checked to be valid).
This requires changing the pixel upload alignment because the odd sizes
might not be aligned to multiples of 4.
Anyway, the restriction has no real benefit and the sizes in between 32
and 64 might be worth using, so just drop it.
Following testing after ebe798a, this is a more than sufficient size to
cover our use case.
The old default was a drop of about 58 dB PSNR using the old code, and
this new default is about 65 dB PSNR, so it's actually an improvement
despite resulting in a smaller size.
There was no outlier whatsoever when comparing sizes around the 64
neighbourhood (with every step corresponding to a PSNR drop of about
0.07 dB), so I picked this since it's a power of two and requires no
change to the current 3dlut-size parsing logic.
I also tested smaller sizes such as 32x32x32 which performed almost as
well on colorful samples, but this results in noticeable black boost in
the dark regions, which is pretty undesirable. Therefore, we should
avoid going much further below 64x64x64.
Either way, this new size is so fast to compute that the 3dlut cache is
almost useless on my end. In fact, it might even be slower to load the
profile from the cache than to recompute it from scratch. (For caches on
a disk. For cache on a tmpfs, it makes no difference)
It seems vo_x11_check_events() was supposed to return the currently
flagged events and reset them. But there are many places where
vo_x11_check_events() is called without checking its return value. This
could lead to forgotten events.
Change the code such that they can't get lost.
This code had the exact same texture indexing bug that the original
scaler code had before the introduction of the LUT_POS macro to fix it.
We can re-use this same macro here, and the performance drop is
virtually entirely negligible. The benefit is greatly improved LUT
accuracy as the 3DLUT size decreases - in particular, the old LUT
started introducing more and more black crush the lower your LUT size is
(because the error was essentially an over-contrast bias, with a
magnitude linearly related to the lut size).
The new code improves black stability as the LUT size decreases, and
only at very low values (16 and below) do black levels start noticeably
getting affected (due to crude linearization of the nonlinear response
curve).
The default value of 3dlut-size is definitely generous enough for this
to make no difference out of the box, but it also causes no performance
drop at all on my machine so I see no harm in improving the logic.
Furthermore, this means we could easily decrease the default 3dlut size
in a future commit, perhaps even down to 64x64x64 as a default. (But
more testing is warranted here)
Both backends have code to close each FD of their wakeup_pipe array.
This array is default-initialized with 0, which means if the backends
exit before the wakeup pipe is created (e.g. when probing), they would
close FD 0.
Initialize the FDs with -1. Then we call close(-1) in these situations,
which is perfectly allowed and has no bad consequences.
This fits natively into the vo/backend and allows to simplify the
polling code.
One new change is the fact that surface_handle_enter flags VO_EVENT_WIN_STATE
and VO_EVENT_RESIZE instead of only VO_EVENT_WIN_STATE. Before this, the code
hackily relied on the timeout and the loop in the wait_frame function to track
and set the scaling factor. Instead, this triggers mpv to run a schedule_resize
and adjust the new VO output dimensions immediately. This is also more accurate
since surface_handle_enter() gets called when a surface is created, moved and
resized, which is exactly what the rest of the player might be interested in.
This uses GLSL mix() instead of going through an indirect texture
access. Easy to implement and might require less resources on some
devices, since the oversample code was already essentially just a
special case of this.
Could be made the new default (as per issue #2685), but that should be
done in a separate commit.
Until now, this has been either handled over vo.event_fd (which should
go away), or by putting event handling on a separate thread. The
backends which do the latter do it for a reason and won't need this, but
X11 and Wayland will, in order to get rid of event_fd.
There's no need to call wl_display_flush() since all the client-side
buffered data has already been flushed prior to polling the fd.
Instead only check for POLLIN and the usual ERR+HUP.
Don't just cause vo_opengl to update the ICC profile every time the
window is moved. Instead, explicitly check if the screen was changed.
Mostly untested.
This makes the difference between passing VA_FRAME_PICTURE or
VA_BOTTOM_FIELD for progressive frames (that should be force-
deinterlaced) to VAProcPipelineParameterBuffer.flags. VA-VPP doesn't
really seem to care, and we can get rid of mp_refqueue_is_interlaced()
entirely. It could be argued it's better to pass field flags instead of
the progressive flag.
"Real" frame flag vs. what we pretend it to be. It always used the real
flag, and thus never deinterlaced unflagged frames, even if the
suboption was set to "no".
The hw_subfmt field roughly corresponds to the field
AVHWFramesContext.sw_format in ffmpeg. The ffmpeg one is of the type
AVPixelFormat (instead of the underlying hardware format), so it's a
good idea to switch to this too for preparation.
Now the hw_subfmt field is an mp_imgfmt instead of an opaque/API-
specific number. VDPAU and Direct3D11 already used mp_imgfmt, but
Videotoolbox and VAAPI had to be switched.
One somewhat user-visible change is that the verbose log will now always
show the hw_subfmt as image format, instead of as nonsensical number.
(In the end it would be good if we could switch to AVHWFramesContext
completely, but the upstream API is incomplete and doesn't cover
Direct3D11 and Videotoolbox.)
This should get mpv working on Windows 7 machines without hardware
accelerated graphics adapters. It already worked on Windows 8 and up
because those systems would silently fall back to WARP if there was no
graphics hardware installed.
The normal MPGL_CAP_SW flag is not set, so unlike other opengl backends,
this will choose a software adapter even if opengl:sw is not specified.
The reason for this is, unlike on Linux, where vo_xv and vo_x11 can be
used, mpv on Windows does not have any VO to fall back on when hardware
acceleration isn't available, so if software adapters are rejected, the
user won't see any video output when using the default settings. WARP
seems to perform quite well, so it should be used in this case.
I made this call up because I sort of thought this makes senssssse. I'm
now convinced that it does not, and even is actively harmful. I'm still
quite in the dark how the DirectD 11 video API is supposed to work with
synchronization, but at least for normal graphics this call would not
make much sense.
These mostly happen in situations where the correct behavior is
relatively new and not found in the wild (therefore not worth
implementing) and/or extremely complicated (and thus not worth worrying
about the potential edge cases and UI changes).
Still, it's best to document these where they happen to guide the poor
souls maintaining these files in the future.
This uses eglPostSubBufferNV to trigger ANGLE to check the window size
and update the size of the swapchain to match, which is recommended
here: https://groups.google.com/d/msg/angleproject/RvyVkjRCQGU/gfKfT64IAgAJ
With the D3D11 backend, using eglPostSubBufferNV with a 0-sized update
region will even skip the Present() call, meaning it won't block for a
vsync period. Hopefully ANGLE will have a less hacky way of doing this
in future. See the relevant ANGLE issue: http://anglebug.com/1438Fixes#3301
This can for example happen with vo_opengl_cb, if it is used with a GL
implementation that does not supports FBOs. (mpv itself should never
attempt to use FBOs if they're not available.)
Without this check it would trigger an assert() in our dummy
glBindFramebuffer wrapper.
Suspected cause of #3308, although it's still unlikely.
For example it should be set to IMGFMT_D3D11NV12 if it isn't already.
Otherwise, an assertion in vf.c could trigger.
This probably couldn't be provoked yet.
This moves some of the bulky user-shader specific logic into the file
dedicated to it. Rather than expose video.c state, variable lookup is
now done via a simulated closure.
This greatly improves the result when decoding typical (ST.2084) HDR
content, since the job of tone mapping gets significantly easier when
you're only mapping from 1000 to 250, rather than 10000 to 250.
The difference is so drastic that we can now even reasonably use
`hdr-tone-mapping=linear` and get a very perceptually uniform result
that is only slightly darker than normal. (To compensate for the extra
dynamic range)
Due to weird implementation details, this only seems to be present on
keyframes (or something like that), so we have to cache the last seen
value for the frames in between.
Also, in some files the metadata is just completely broken /
nonsensical, so I decided to apply a simple heuristic to detect
completely broken metadata.
This involves multiple changes:
1. Brightness metadata is split into nominal peak and signal peak.
For a quick and dirty explanation: nominal peak is the brightest value
that your color space can represent (i.e. the brightness of an encoded
1.0), and signal peak is the brightest value that actually occurs in
the video (i.e. the brightest thing that's displayed).
2. vo_opengl uses a new decision logic to figure out the right nom_peak
and sig_peak for all situations. It also does a better job of picking
the right target gamut/colorspace to use for the OSD. (Which still is
and still should be treated as sRGB). This change in logic also
fixes#3293 en passant.
3. Since it was growing rapidly, the logic for auto-guessing / inferring
the right colorimetry configuration (in pass_colormanage) was split from
the logic for actually performing the adaptation (now pass_color_map).
Right now, the new logic doesn't do a whole lot since HDR metadata is
still ignored (but not for long).
This has two reasons:
1. I tend to add new fields to this metadata, and every time I've done
so I've consistently forgotten to update all of the dozens of places in
which this colorimetry metadata might end up getting used. While most
usages don't really care about most of the metadata, sometimes the
intend was simply to “copy” the colorimetry metadata from one struct to
another. With this being inside a substruct, those lines of code can now
simply read a.color = b.color without having to care about added or
removed fields.
2. It makes the type definitions nicer for upcoming refactors.
In going through all of the usages, I also expanded a few where I felt
that omitting the “young” fields was a bug.
Commit 883d3114 seems to have (accidentally?) dropped the FBOTEX_FUZZY
from the output_fbo resize, which means that current master will keep
resizing and resizing the FBO as you change the window size, introducing
severe memory leaking after a while. (Not sure why that would cause
memory leaks, but I blame nvidia)
Either way, it's bad for performance too, so it's worth fixing.
vo_vaapi is the only thing which can't scale RGBA on the GPU. (Other
cases of RGBA scaling are handled in draw_bmp.c for some reason.)
Move this code and get rid of the osd_conv_cache thing.
Functionally, nothing changes.
This is how PBOs are normally supposed to be used.
Unfortunately I can't see an any absolute improvement on nVidia binary
drivers and playing 4K material. Compared to the "old" PBO path with 1
buffer, the measured GL time decreases significantly, though.
GL generally does not support flipping the image on upload, meaning
negative strides are not supported. vo_opengl handles this by flipping
rendering if the stride is inverted, and gl_pbo_upload() "ignores"
negative strides by uploading without flipping the image.
If individual planes had strides with different signs, this broke. The
flipping affected the entire image, and only the sign of the first plane
was respected.
This is just a crazy corner case that will never happen, but it turns
out this is quite simple to support, and actually improves the code
somewhat.
This introduces a gl_pbo_upload_tex() function, which works almost like
our gl_upload_tex() glTexSubImage2D() wrapper, except it takes a struct
which caches the PBO handles. It also takes the full texture size (to
make allocating an ideal buffer size easier), and a parameter to disable
PBOs (so that the caller doesn't have to duplicate the gl_upload_tex()
call if PBOs are disabled or unavailable).
This also removes warnings and fallbacks on PBO failure. We just
silently try using PBOs on every frame, and if that fails at some point,
revert to normal texture uploads. Probably doesn't matter.
This makes the geometry of the sizing borders more like the ones in
Windows 10. It also fixes an off-by-one error that made the right and
bottom borders thinner than the left and top borders, which made it
difficult to resize the window when using the Windows 7 classic theme
(because it has pretty thin sizing borders to begin with.)
No method of taking a screenshot was implemented at all. vo_opengl
lacked window screenshotting, because ANGLE doesn't allow reading the
frontbuffer. There was no way to read back from a D3D11 texture either.
Implement reading image data from D3D11 textures. This is a low-quality
effort to get basic screenshots done. Eventually there will be a better
implementation: once we use AVHWFramesContext natively, the readback
implementation will be in libavcodec, and will be able to cache the
staging texture correctly. Hopefully. (For now it doesn't even have a
AVHWFramesContext for D3D11 yet. But the abstraction is more appropriate
for this purpose.)
OK, this was dumb. The file didn't have much to do with ANGLE, and the
functionality can simply be moved to d3d.c. That file contains helpers
for decoding, but can always be present (on Windows) since it doesn't
access any D3D specific libavcodec APIs. Thus it doesn't need to be
conditionally built like the actual hwaccel wrappers.
Instead of hard-coding a big list, move some of the functionality
to csputils. Affects both the auto-guess blacklist and the peak
estimation.
Also update the comments.
Too many "exceptions" these days, it's easier to just hard-code a
whitelist instead of a blacklist. And besides, it only really makes
sense to avoid adaptation for BT.601 specifically, since that's the one
we auto-guess based on the resolution.
I'm not even sure why we ever consulted *_src to begin with, since that
just describes the current image format - and not the original metadata.
(And in fact, we specifically had logic to work around the impliciations
this had on linear scaling)
image_params is *the* authoritative source on the intended (i.e.
reference) image metadata, whereas *_src may be changed by previous
passes already. So only consult image_params for picking auto-generated
values.
Also, add some more missing "wide gamut" and "non-gamma" curves to the
autoconfig blacklist. (Maybe it would make sense to move this list to
csputils in the future? Or perhaps even auto-detect it based on the
associated primaries)
User request and not that hard. Closes#3157.
Note that FFmpeg doesn't support this and there's no signalling in HEVC
etc., so the only way users can access it is by using vf_format
manually.
Mind: This encoding uses full range values, not TV range.
This is actually not entirely trivial since it involves negative Yxy
coordinates, so the CMM has to be capable of full floating point
operation. Fortunately, LittleCMS is, so we can just blindly implement
it.
This HDR function is unique in that it's still display-referred, it just
allows for values above the reference peak (super-highlights). The
official standard doesn't actually document this very well, but the
nominal peak turns out to be exactly 12.0 - so we normalize to this
value internally in mpv. (This lets us preserve the property that the
textures are encoded in the range [0,1], preventing clipping and making
the best use of an integer texture's range)
This was grouped together with SMPTE ST2084 when checking libavutil
compatibility since they were added in the same release window, in a
similar timeframe.
The main framebuffer is not the default framebuffer for the dxinterop
backend. Bind the main framebuffer and use the appropriate attachment
when reading the window content.
Fix#3284
The size check introduced in commit d941a57b did not consider that Xv
can round up the image size to the next chroma boundary. Doing that
makes sense, so it can't certainly be considered server misbehavior.
Do 2 things against this: allow if the server returns a larger image (we
just crop it then), and also allocate a properly aligned image in the
first place.
See previous commit. (The mixing case was never supported, so this has
equivalent functionality.)
This also implicitly fixes a bug: the old code allocated image data for
the cropped surface size only, which means the
video_surface_get_bits_y_cb_cr call could write beyond the allocated
image memory.
For now, this affects taking screenshots only.
If the video mixer is actually needed, we want to go through the video
mixer for screenshots too - which is why this function simply always
used the video mixer, and then copied the surface to RAM as RGBA.
Add reading the surface as nv12 if possible. If the format is correct,
and no special video processing (like deinterlacing) is used, then it's
read directly without going through the video mixer.
There's no particular reason for doing this, other than avoiding the
video mixer (like vo_opengl can do it now). Also, the next commit will
make use of this in vf_vdpaurb.c.
The GL_ARB_timer_query extension and thus the GL_TIME_ELAPSED constant
don't exist for GLES.
For ES the EXT_disjoint_timer_query is used so take the constant from
that else provide the constant manually.
See pr #3216 which introduced this error.
Until now, we've always converted vdpau video surfaces to RGB, and then
mapped the resulting RGB texture. Change this so that the surface is
mapped as NV12 plane textures.
The reason this wasn't done until now is because vdpau surfaces are
mapped in an "interlaced" way as separate fields, even for progressive
video. This requires messy reinterleraving. It turns out that even
though it's an extra processing step, the result can be faster than
going through the video mixer for RGB conversion.
Other than some potential speed-gain, doing this has multiple other
advantages. We can apply our own color conversion, which is important in
more complex cases. We can correctly apply debanding and potentially
other processing that requires chroma-specific or in-YUV handling.
If deinterlacing is enabled, this switches back to the old RGB
conversion method. Until we have at least a primitive deinterlacer in
vo_opengl, this will stay this way. The d3d11 and vaapi code paths are
similar. (Of course these don't require any crazy field reinterleaving.)
Otherwise stale references will survive forever. Could leak hardware
video surfaces.
In particular, the mpv vdpau code crashed with an assertion when exiting
after toggling deinterlacing, because not all references were released.
1. this basically reverts commit de4c74e5a4.
even with CVDisplayLinkCreateWithActiveCGDisplays and
CVDisplayLinkSetCurrentCGDisplayFromOpenGLContext we still have to
explicitly set the current display ID, otherwise it will just always
choose the display with the lowest refresh rate. another weird thing is,
we still have to set the display ID another time with
CVDisplayLinkSetCurrentCGDisplay after the link was started. otherwise
the display period is 0 and the fallback will be used.
if we ever use the callback method for something useful it's probably
better to use CVDisplayLinkCreateWithActiveCGDisplays since we will need
to keep the display link around instead of releasing it at the end.
in that case we have to call CVDisplayLinkSetCurrentCGDisplay two times,
once before and once after LinkStart.
2. add windowDidChangeScreen delegate to update the display refresh rate
when mpv is moved to a different screen.
We have two problems here.
1. CVDisplayLinkGetActualOutputVideoRefreshPeriod, like the name suggests,
returns a frame period and not a refresh rate. using this as screen_fps
just leads to a slideshow. why didn't this break video playback on OS X
completely? the answer to this leads us to the second problem.
2. it seems that CVDisplayLinkGetActualOutputVideoRefreshPeriod always
returns 0 if used without CVDisplayLinkSetOutputCallback and hence always
fell back to CVDisplayLinkGetNominalOutputVideoRefreshPeriod. adding a
callback to CVDisplayLink solves this problem. the callback function at
this moment doesn't do anything but could possibly used in the future.
This can also remove all the stuff for lazily attaching the texture. It
doesn't matter if the dxinterop backend changes the bound framebuffer
during a VOCTRL, since the renderer does not rely on the GL state being
preserved.
The previous few commits changed sd_lavc.c's output to packed RGB sub-
images. In particular, this means all sub-bitmaps are part of a larger,
single bitmap. Change the vo_opengl OSD code such that it can make use
of this, and upload the pre-packed image, instead of packing and copying
them again.
This complicates the upload code a bit (4 code paths due to messy PBO
handling). The plan is to make sub-bitmaps always packed, but some more
work is required to reach this point. The plan is to pack libass images
as well. Since this implies a copy, this will make it easy to refcount
the result.
(This is all targeted towards vo_opengl. Other VOs, vo_xv, vo_x11, and
vo_wayland in particular, will become less efficient. Although at least
vo_vdpau and vo_direct3d could be switched to the new method as well.)
Until now, subtitle renderers could export SUBBITMAP_INDEXED, which is a
8 bit per pixel with palette format. sd_lavc.c was the only renderer
doing this, and the result was converted to RGBA in every use-case
(except maybe when the subtitles were hidden.)
Change it so that sd_lavc.c converts to RGBA on its own. This simplifies
everything a bit, and the palette handling can be removed from the
common code.
This is also preparation for making subtitle images refcounted. The
"caching" in img_convert.c is a PITA in this respect, and needs to be
redone. So getting rid of some img_convert.c code is a positive side-
effect. Also related to refcounted subtitles is packing them into a
single mp_image. Fewer objects to refcount is easier, and for the libass
format the same will be done. The plan is to remove manual packing from
the VOs which need single images entirely.
Apply the padding internally to each input bitmap, instead of requiring
this for the semi-public API.
Right now, everything still uses packer_pack_from_subbitmaps() to fill
the input bitmap sizes, but that's going to change with the following
commit. Since bitmap_packer.in is mutated during packing anyway, it's
more convenient to add the padding automatically.
Also, guarantee that every sub-bitmap has a padding border around it.
Don't let the padding overlap. Add padding even on the containing
borders.
This is simpler, doesn't cost much in memory usage, and is convenient
for one of the following commits.
This is the ES equivalent to ARB_timer_query. It enables the performance
timers on ANGLE. All the added functions should be identical in
semantics to their desktop GL equivalents.
The OpenGL 3.0+ and ES specs are quite clear on what values are
accepted for the attachment object name parameter. And there's no
overlap for the default framebuffer. Sigh.
Probably fixes Mesa raising an error in this case and might fix#3251.
Regression by the previous vo_opengl change.
Until now, we've used system-specific API (GLX, EGL, etc.) to retrieve
the depth of the default framebuffer. (We equal this to display depth
and use the determined depth for dithering.)
We can actually retrieve this value through standard GL API, and it
works everywhere (except GLES 2 of course). This simplifies everything a
great deal.
egl_helpers.c is empty now. But I expect that some EGL boilerplate will
be moved to it, so don't remove it yet.