This is a straightforward parallel implementation of error diffusion
algorithms in compute shader. Basically we use single work group with
maximal possible size to process the whole image. After a shift
mapping we are able to process all pixels column by column.
A large ring buffer are allocated in shared memory to speed things up.
However the size of required shared memory depends linearly on the
height of video window (or screen height in fullscreen mode). In case
there is no enough shared memory, it will fallback to `--dither=fruit`.
The maximal allowed work group size is hardcoded as 1024. Ideally we
could query `GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS`. But for whatever
reason, it seems most high end card from nvidia and amd support only
the minimal required value, so I guess we can stick to it for now.
I assume (but cannot confirm) that VA-AP-API is in fact a typo, because
most if not all search engine results related to it are from mpv's manual
page.
By changing this to VA-API and clarifying that this requires VA-API support
on a system to use it, we can hopefully make it clear to unsuspecting
Windows users that this is not the filter they're looking for.
Concerns #6690.
This allows to select the drm mode using a string specification. You
can either select the the preferred mode, the mode with the highest
resolution, by specifying WxH[@R] or by its index in the list of modes
as before.
This was implemented by using OPT_STRING_VALIDATE for drm-mode,
instead of OPT_INT. Using a string here also prepares for future
additions to drm-mode that aim to allow specifying a mode by its
resolution.
It is useful when debugging to be able to force atomic off, or as a
workaround if atomic breaks for some user. Legacy modesetting is less
likely to break by virtue of being a less complex API.
half of the materials we used were deprecated with macOS 10.14, broken
and not supported by run time changes of the macOS theme. furthermore
our styling names were completely inconsistent with the actually look
since macOS 10.14, eg ultradark got a lot brighter and couldn't be
considered ultradark anymore.
i decided to drop the old option --macos-title-bar-style and rework
the whole mechanism to allow more freedom. now materials and appearance
can be set separately. even if apple changes the look or semantics in
the future the new options can be easily adapted.
Manual changes done:
* Merged the interface-changes under the already master'd changes.
* Moved the hwdec-related option changes to video/decode/vd_lavc.c.
Rather than the linear cd/m^2 units, these (relative) logarithmic units
lend themselves much better to actually detecting scene changes,
especially since the scene averaging was changed to also work
logarithmically.
In theory our "eye adaptation" algorithm works in both ways, both
darkening bright scenes and brightening dark scenes. But I've always
just prevented the latter with a hard clamp, since I wanted to avoid
blowing up dark scenes into looking funny (and full of noise).
But allowing a tiny bit of over-exposure might be a good thing. I won't
change the default just yet (better let users test), but a moderate
value of 1.2 might be better than the current 1.0 limit. Needs testing
especially on dark scenes.
The previous approach of using an FIR with tunable hard threshold for
scene changes had several problems:
- the FIR involved annoying hard-coded buffer sizes, high VRAM usage,
and the FIR sum was prone to numerical overflow which limited the
number of frames we could average over. We also totally redesign the
scene change detection.
- the hard scene change detection was prone to both false positives and
false negatives, each with their own (annoying) issues.
Scrap this entirely and switch to a dual approach of using a simple
single-pole IIR low pass filter to smooth out noise, while using a
softer scene change curve (with tunable low and high thresholds), based
on `smoothstep`. The IIR filter is extremely simple in its
implementation and has an arbitrarily user-tunable cutoff frequency,
while the smoothstep-based scene change curve provides a good, tunable
tradeoff between adaptation speed and stability - without exhibiting
either of the traditional issues associated with the hard cutoff.
Another way to think about the new options is that the "low threshold"
provides a margin of error within which we don't care about small
fluctuations in the scene (which will therefore be smoothed out by the
IIR filter).
Instead of desaturating towards luma, we desaturate towards the
per-channel tone mapped version. This essentially proves a smooth
roll-off towards the "hollywood"-style (non-chromatic) tone mapping
algorithm, which works better for bright content, while continuing to
use the "linear" style (chromatic) tone mapping algorithm for primarily
in-gamut content.
We also split up the desaturation algorithm into strength and exponent,
which allows users to use less aggressive desaturation settings without
affecting the overall curve.
Too many broken hardware decoders. Noticed wrong decoding of a video
file encoded with x262 on RX Vega when using VAAPI (Mesa 18.3.2).
Looks fine with swdec and a cheap hardware BD player.
Reverts 017f3d0674
--record-file is nice, but only sometimes. If you watch some sort of
livestream which you want to record, it's actually much nicer not to
record what you're currently "seeing", but anything you're receiving.
This option has been deprecated upstream for a long time, probably
doesn't even work anymore, and won't work moving forwards as we replace
the vulkan code by libplacebo wrappers.
I haven't removed the option completely yet since in theory we could
still add support for e.g. a native glslang wrapper in the future. But
most likely the future of this code is deletion.
As an aside, fix an issue where the man page didn't mention d3d11.
This commit bumps the libmpv version to 1.102
drm-osd-plane -> drm-draw-plane
drm-video-plane -> drm-drmprime-video-plane
drm-osd-size -> drm-draw-surface-size
"draw plane", as in the plane that OpenGL draws to, whether it be
video + OSD or just OSD.
"drmprime video plane", as in the plane used for hwdec video imported
via drmprime.
"draw surface size", as in the size of the surface used for the draw plane
The new names are invariant whether or not hwdec_drmprime_drm is being
used or not. The original naming was very confusing, as when doing
regular rendering (swdec or vaapi) the video would be displayed on the
"OSD plane", and the "Video plane" would remain unused.
Add general primary/overlay plane option to drm-osd-plane-id and
drm-video-plane-id, so that the user can just request any usable
primary or overlay plane for either of these two options. This should
be somewhat more user-friendly (especially as neither of these two
options currently have a useful help function), as usually you would
only be interested in the type of the plane, and not exactly which
plane gets picked.
Despite their place in the tree, hwdecs can be loaded and used just
fine by the vulkan GPU backend.
In this change we add Vulkan interop support to the cuda/nvdec hwdec.
The overall process is mostly straight forward, so the main observation
here is that I had to implement it using an intermediate Vulkan buffer
because the direct VkImage usage is blocked by a bug in the nvidia
driver. When that gets fixed, I will revist this.
Nevertheless, the intermediate buffer copy is very cheap as it's all
device memory from start to finish. Overall CPU utilisiation is pretty
much the same as with the OpenGL GPU backend.
Note that we cannot use a single intermediate buffer - rather there
is a pool of them. This is done because the cuda memcpys are not
explicitly synchronised with the texture uploads.
In the basic case, this doesn't matter because the hwdec is not
asked to map and copy the next frame until after the previous one
is rendered. In the interpolation case, we need extra future frames
available immediately, so we'll be asked to map/copy those frames
and vulkan will be asked to render them. So far, harmless right? No.
All the vulkan rendering, including the upload steps, are batched
together and end up running very asynchronously from the CUDA copies.
The end result is that all the copies happen one after another, and
only then do the uploads happen, which means all textures are uploaded
the same, final, frame data. Whoops. Unsurprisingly this results in
the jerky motion because every 3/4 frames are identical.
The buffer pool ensures that we do not overwrite a buffer that is
still waiting to be uploaded. The ra_buf_pool implementation
automatically checks if existing buffers are available for use and
only creates a new one if it really has to. It's hard to say for sure
what the maximum number of buffers might be but we believe it won't
be so large as to make this strategy unusable. The highest I've seen
is 12 when using interpolation with tscale=bicubic.
A future optimisation here is to synchronise the CUDA copies with
respect to the vulkan uploads. This can be done with shared semaphores
that would ensure the copy of the second frames only happens after the
upload of the first frame, and so on. This isn't trivial to implement
as I'd have to first adjust the hwdec code to use asynchronous cuda;
without that, there's no way to use the semaphore for synchronisation.
This should result in fewer intermediate buffers being required.
Since linear downscaling makes sense to handle independently from
linear/sigmoid upscaling, we split this option up. Now,
linear-downscaling is its own option that only controls linearization
when downscaling and nothing more. Likewise, linear-upscaling /
sigmoid-upscaling are two mutually exclusive options (the latter
overriding the former) that apply only to upscaling and no longer
implicitly enable linear light downscaling as well.
The old behavior was very confusing, as evidenced by issues such
as #6213. The current behavior should make much more sense, and only
minimally breaks backwards compatibility (since using linear-scaling
directly was very uncommon - most users got this for free as part of
gpu-hq and relied only on that).
Closes#6213.
Someone on IRC pointed out that the default stats bindings weren't
documented in the interactive control section of the manual, so
let's add them with a short mention and a reference to the STATS
section of the manual.
by default the pixel format creation falls back to software renderer
when everything fails. this is mostly needed for VMs. additionally one
can directly request an sw renderer or exclude it entirely.
The demuxer cache is the only cache now. Might need another change to
combat seeking failures in mp4 etc. The only bad thing is the loss of
cache-speed, which was sort of nice to have.
duration is parsed as an integer, and the default value is used if ```-1``` is passed. Passing ```-``` as described here causes a parameter value error.
The player fully restarts playback when the edition or disk title is
changed. Before this, the player tried to reinitialized playback
partially. For example, it did not print a new "Playing: <file>"
message, and did not send playback end to libmpv users (scripts or
applications).
This playback restart code was a bit messy and could have unforeseen
interactions with various state. There have been bugs before. Since it's
a mostly cosmetic thing for an obscure feature, just change it to a full
restart. This works well, though since it may have consequences for
scripts or client API users, mention it in interface-changes.rst.