Despite their place in the tree, hwdecs can be loaded and used just
fine by the vulkan GPU backend.
In this change we add Vulkan interop support to the cuda/nvdec hwdec.
The overall process is mostly straight forward, so the main observation
here is that I had to implement it using an intermediate Vulkan buffer
because the direct VkImage usage is blocked by a bug in the nvidia
driver. When that gets fixed, I will revist this.
Nevertheless, the intermediate buffer copy is very cheap as it's all
device memory from start to finish. Overall CPU utilisiation is pretty
much the same as with the OpenGL GPU backend.
Note that we cannot use a single intermediate buffer - rather there
is a pool of them. This is done because the cuda memcpys are not
explicitly synchronised with the texture uploads.
In the basic case, this doesn't matter because the hwdec is not
asked to map and copy the next frame until after the previous one
is rendered. In the interpolation case, we need extra future frames
available immediately, so we'll be asked to map/copy those frames
and vulkan will be asked to render them. So far, harmless right? No.
All the vulkan rendering, including the upload steps, are batched
together and end up running very asynchronously from the CUDA copies.
The end result is that all the copies happen one after another, and
only then do the uploads happen, which means all textures are uploaded
the same, final, frame data. Whoops. Unsurprisingly this results in
the jerky motion because every 3/4 frames are identical.
The buffer pool ensures that we do not overwrite a buffer that is
still waiting to be uploaded. The ra_buf_pool implementation
automatically checks if existing buffers are available for use and
only creates a new one if it really has to. It's hard to say for sure
what the maximum number of buffers might be but we believe it won't
be so large as to make this strategy unusable. The highest I've seen
is 12 when using interpolation with tscale=bicubic.
A future optimisation here is to synchronise the CUDA copies with
respect to the vulkan uploads. This can be done with shared semaphores
that would ensure the copy of the second frames only happens after the
upload of the first frame, and so on. This isn't trivial to implement
as I'd have to first adjust the hwdec code to use asynchronous cuda;
without that, there's no way to use the semaphore for synchronisation.
This should result in fewer intermediate buffers being required.
Since linear downscaling makes sense to handle independently from
linear/sigmoid upscaling, we split this option up. Now,
linear-downscaling is its own option that only controls linearization
when downscaling and nothing more. Likewise, linear-upscaling /
sigmoid-upscaling are two mutually exclusive options (the latter
overriding the former) that apply only to upscaling and no longer
implicitly enable linear light downscaling as well.
The old behavior was very confusing, as evidenced by issues such
as #6213. The current behavior should make much more sense, and only
minimally breaks backwards compatibility (since using linear-scaling
directly was very uncommon - most users got this for free as part of
gpu-hq and relied only on that).
Closes#6213.
Someone on IRC pointed out that the default stats bindings weren't
documented in the interactive control section of the manual, so
let's add them with a short mention and a reference to the STATS
section of the manual.
by default the pixel format creation falls back to software renderer
when everything fails. this is mostly needed for VMs. additionally one
can directly request an sw renderer or exclude it entirely.
duration is parsed as an integer, and the default value is used if ```-1``` is passed. Passing ```-``` as described here causes a parameter value error.
The player fully restarts playback when the edition or disk title is
changed. Before this, the player tried to reinitialized playback
partially. For example, it did not print a new "Playing: <file>"
message, and did not send playback end to libmpv users (scripts or
applications).
This playback restart code was a bit messy and could have unforeseen
interactions with various state. There have been bugs before. Since it's
a mostly cosmetic thing for an obscure feature, just change it to a full
restart. This works well, though since it may have consequences for
scripts or client API users, mention it in interface-changes.rst.
The only effective difference is that the former explicitly checks
whether the JSON value type is string, and errors out if not. The rest
is exactly the same (mpv_set_property_string is mpv_set_property with
MPV_FORMAT_STRING).
It seems silly to keep this, so just remove it.
With the advent of actual HDR devices, my real measured ICC profile has
an "infinite" contrast, since the display is completely off on pure
black inputs. 100k:1 might not be enough, so let's just bump it up to
1m:1 to be safe.
Also, improve the logging in the case that the detected contrast is too
high by default.
With the internal change from stringlist to keyvaluelist, these
sub-options stop working. I don't really care enough to bring them
back. (Order doesn't matter, -del always seemed annoying.)
We are currently using primary / overlay planes drm objects, assuming that primary plane is osd and overlay plane is video.
This commit is doing two things :
- replace the primary / overlay planes members with osd and video planes member without the assumption
- Add two more options to determine which one of the primary / overlay is associated to osd / video.
- It will default osd to overlay and video to primary if unspecified
That new API was introduced and allows to have several native resources.
Thisuses that mechanisma for drm resources rather than the deprecated
opengl-cb structs.
This patch therefore add two structs that can be used with the drm atomic interop.
- mpv_opengl_drm_params : which will hold all the drm handles
- mpv_opengl_drm_osd_size : which will hold osd layer size
This commit adds a drm-osd-size=WxH parameter to commandline which
allows to define the OSD plane dimension. OSD can be upscaled to
screen resolution when having OSD at video resolution is too heavy.
This is especially useful for UHD modes on embedded devices where
the GPU cannot handle UHD modes at a decent framerate.
Instead of using an internal counter to keep track of the value that was
set last, attempt to find the current value of the property/option in
the value list, and then set the next value in the list.
There are some potential problems. If a property refuses to accept a
specific value, the cycle-values command will fail, and start from the
same position again. It can't know that it's supposed to skip the next
value. The same can happen to properties which behave "strangely", such
as the "aspect" property, which will return the current aspect if you
write "-1" to it. As a consequence, cycle-values can appear to get
"stuck".
I still think the new behavior is what users expect more, and which is
generally more useful. We won't restore the ability to get the old
behavior, unless we decide to revert this commit entirely.
Fixes#5772, and hopefully other complaints.
The main change is that we wait with opening the muxer ("writing
headers") until we have data from all streams. This fixes race
conditions at init due to broken assumptions in the old code.
This also changes a lot of other stuff. I found and fixed a few API
violations (often things for which better mechanisms were invented, and
the old ones are not valid anymore). I try to get away from the public
mutex and shared fields in encode_lavc_context. For now it's still
needed for some timestamp-related fields, but most are gone. It also
removes some bad code duplication between audio and video paths.
Attempts to enable the following things:
- let a render API user do "proper" audio-sync video timing itself
- make it possible to not re-render repeated frames if the API user has
better mechanisms available (e.g. waiting for a DisplayLink cycle
instead)
- allow the user to delay or skip redraws if it makes sense
Basically this information will be needed by API users who want to be
"clever" about optimizing timing and rendering.
DR (letting the decoder allocate texture memory) requires running the
allocation on the render thread. This is rather hard with the render
API, because the user controls this thread and when it's entered. It was
not possible until now.
This commit adds a bunch of infrastructure to make this possible. We add
a new optional mode (MPV_RENDER_PARAM_ADVANCED_CONTROL) which basically
lets the user's render thread and libmpv agree how this should be done.
Misuse would lead to deadlocks. To make this less likely, strictly
document thread safety/locking issues. In particular, document which
libmpv functions can be called without issues. (The rest has to be
assumed unsafe.)
The worst issue is destruction of the render context while video is
still active. To avoid certain unintended recursive locks (i.e.
deadlocks, unless we'd make the locks recursive), make the update
callback lock separate. Make "killing" the video chain asynchronous, so
we can do extra work while video is being destroyed.
Because losing wakeups is a big deal, setting the update callback now
triggers a wakeup. (It would have been better if the wakeup callback
were a parameter to mpv_render_context_create(), but too late.)
This commit does not add DR yet; the following commit does this.
Fundamentally, scripts are loaded asynchronously, but as a feature,
there was code to wait until a script is loaded (for a certain arbitrary
definition of "loaded"). This was done in scripting.c with the
wait_loaded() function.
This called mp_idle(), and since there are commands to load/unload
scripts, it meant the player core loop could be entered recursively. I
think this is a major complication and has some problems. For example,
if you had a script that does 'os.execute("sleep inf")', then every time
you ran a command to load an instance of the script would add a new
stack frame of mp_idle(). This would lead to some sort of reentrancy
horror that is hard to debug. Also misc/dispatch.c contains a somewhat
tricky mess to support such recursive invocations. There were also some
bugs due to this and due to unforeseen interactions with other messes.
This scripting stuff was the only thing making use of that reentrancy,
and future commands that have "logical" waiting for something should be
implemented differently. So get rid of it.
Change the code to wait only in the player initialization phase: the
only place where it really has to wait is before playback is started,
because scripts might want to set options or hooks that interact with
playback initialization. Unloading of builtin scripts (can happen with
e.g. "set osc no") is left asynchronous; the unloading wasn't too robust
anyway, and this change won't make a difference if someone is trying to
break it intentionally. Note that this is not in mp_initialize(),
because mpv_initialize() uses this by locking the core, which would have
the same problem.
In the future, commands which logically wait should use different
mechanisms. Originally I thought the current approach (that is removed
with this commit) should be used, but it's too much of a mess and can't
even be used in some cases. Examples are:
- "loadfile" should be made blocking (needs to run the normal player
code and manually unblock the thread issuing the command)
- "add-sub" should not freeze the player until the URL is opened (needs
to run opening on a separate thread)
Possibly the current scripting behavior could be restored once new
mechanisms exist, and if it turns out that anyone needs it.
With this commit there should be no further instances of recursive
playloop invocations (other than the case in the following commit),
since all mp_idle()/mp_wait_events() calls are done strictly from the
main thread (and not commands/properties or libmpv client API that
"lock" the main thread).
The first change is about spdif - I mostly ignore spdif issues these
days, but it seems like the recent changes made handling of it slightly
better (but I didn't really test).
The second change is about broken libavfilter filters. We won't restore
the old behavior, because people were complaining about the old behavior
in the past. Possibly we could make libavfilter export this was metadata
and use the old behavior if we know they're broken - but it doesn't
exist yet.
Until recently, the AO was reinitialized strictly only on decoder format
changes. But the commit for simplifying audio format negotiation removed
this. Now the AO is recreated for any format change.
This is sort of annoying if you change playback speed. The
insertion/removal of af_scaletempo can change the sample format. For
example, the acompressor filter will convert output to double, so
toggling scaletempo will force the format back to float. This recreates
the AO under the --gapless-audio=weak default. This likely affects a lot
of other filters too.
Work this around by allowing sample format changes, and keeping the
current AO format in these cases. This is probably not a big problem.
Most audio APIs force the output format to float anyway.
This means you actually have to worry about what the default gapless
mode does to your audio. If you start with a file that uses 8 bit per
sample, and then continue playing a 24 bit FLAC, it will be converted
down to 8 bit per sample. (Assuming they are played in a way that uses
the gapless logic.)
One can now set the number of buffers and the buffer size.
This can reduce the CPU usage and the total latency stays mostly the same.
As there are sync mechanisms the A/V sync continue intact and working.
It also modifies 6.1 channel order, as per OpenAL spec
and add AOPLAY_FINAL_CHUNK support
Uses OpenAL Soft's AL_DIRECT_CHANNELS_SOFT extension and can be controlled through
a new CLI option, --openal-direct-channels.
This allows one to send the audio data direrctly to the desired channel without
effects applied.