Get rid of the old vf.c code. Replace it with a generic filtering
framework, which can potentially handle more than just --vf. At least
reimplementing --af with this code is planned.
This changes some --vf semantics (including runtime behavior and the
"vf" command). The most important ones are listed in interface-changes.
vf_convert.c is renamed to f_swscale.c. It is now an internal filter
that can not be inserted by the user manually.
f_lavfi.c is a refactor of player/lavfi.c. The latter will be removed
once --lavfi-complex is reimplemented on top of f_lavfi.c. (which is
conceptually easy, but a big mess due to the data flow changes).
The existing filters are all changed heavily. The data flow of the new
filter framework is different. Especially EOF handling changes - EOF is
now a "frame" rather than a state, and must be passed through exactly
once.
Another major thing is that all filters must support dynamic format
changes. The filter reconfig() function goes away. (This sounds complex,
but since all filters need to handle EOF draining anyway, they can use
the same code, and it removes the mess with reconfig() having to predict
the output format, which completely breaks with libavfilter anyway.)
In addition, there is no automatic format negotiation or conversion.
libavfilter's primitive and insufficient API simply doesn't allow us to
do this in a reasonable way. Instead, filters can use f_autoconvert as
sub-filter, and tell it which formats they support. This filter will in
turn add actual conversion filters, such as f_swscale, to perform
necessary format changes.
vf_vapoursynth.c uses the same basic principle of operation as before,
but with worryingly different details in data flow. Still appears to
work.
The hardware deint filters (vf_vavpp.c, vf_d3d11vpp.c, vf_vdpaupp.c) are
heavily changed. Fortunately, they all used refqueue.c, which is for
sharing the data flow logic (especially for managing future/past
surfaces and such). It turns out it can be used to factor out most of
the data flow. Some of these filters accepted software input. Instead of
having ad-hoc upload code in each filter, surface upload is now
delegated to f_autoconvert, which can use f_hwupload to perform this.
Exporting VO capabilities is still a big mess (mp_stream_info stuff).
The D3D11 code drops the redundant image formats, and all code uses the
hw_subfmt (sw_format in FFmpeg) instead. Although that too seems to be a
big mess for now.
f_async_queue is unused.
The RA_CAP_FRAGCOORD checks apply to dumb mode as well, but they were
after the check for dumb mode, which returns early, so they never ran.
Fixes#5436
Using vdpau will allocate additional textures for the reinterleaving
step, which uninit_rendering() will free. This is a problem because the
hwdec image remains mapped when reinitializing, so the reinterleaving
textures are turned into dangling pointers. Fix this by freeing the
reinterleave textures on full uninit instead.
Fixes#5447.
It was actually already implemented as ta_dup_ptrtype(), but that seems
like a clunky name. Also we still use the talloc_ names throughout the
source, and I'd rather use an old name instead of a mixing inconsistent
naming conventions.
mp_sws_set_from_cmdline() has the only purpose to respect the --sws-
command line options. Instead of forcing callers to get the option
struct containing these, let callers pass mpv_global, and get it from
the option core code directly. This avoids minor annoyances later on.
DR (direct rendering) works by having the decoder decode into the GPU
staging buffers, instead of copying the video data on texture upload. We
did this even for formats unsupported by the GPU or the renderer. This
"worked" because the staging memory is untyped, and the video frame was
converted by libswscale to a supported format, and then uploaded with a
copy using the normal non-DR texture upload path.
Even though it "works", we don't gain anything from using the staging
buffers for decoding, since we can't use them for upload anyway. Also,
staging memory might be potentially limited (what really happens is up
to the driver). It's easy to avoid, so just skip it in these cases.
The check_gl_features(p) call here checks whether dumb mode can be used.
It uses the field use_integer_conversion, which is set _after_ the call
in the same function. Move check_gl_features() to the end of the
function, when use_integer_conversion is finally set.
Fixes that it tried to use bilinear filtering with integer textures. The
bug disabled the code that is supposed to convert it to non-integer
textures.
This segfaults otherwise. The conditional is needed to break a circular
dependency (gl_init depends on mpgl_load_functions which depends on
recreate_dispmanx which calls gl_ctx_resize).
Fixes#5398
Remove the max_count creation parameter, because it's pointless and
rarely ever did anything. Add a talloc parent parameter instead (which
is something completely different, but convenient, and all callers needs
to be changed anyway).
Instead of clearing the pool when the now removed maximum is reached,
clear it on image parameter changes instead.
This enables DXVA2 hardware decoding with ra_d3d11. It should be useful
for Windows 7, where D3D11VA is not available. Images are transfered
from D3D9 to D3D11 using D3D9Ex surface sharing[1].
Following Microsoft's recommendations, it uses a queue of shared
surfaces, similar to Microsoft's ISurfaceQueue. This will hopefully
prevent surface sharing from impacting parallelism and allow multiple
D3D11 frames to be in-flight at once.
[1]: https://msdn.microsoft.com/en-us/library/windows/desktop/ee913554.aspx
In a lost device scenario, resize() will fail and p->backbuffer will be
NULL. We can't recover from lost devices yet, but we should still check
for a NULL backbuffer in start_frame() rather than crashing.
Also remove a NULL check for p->swapchain. This was a red herring, since
p->swapchain never becomes NULL in an error condition, but p->backbuffer
actually does.
This should fix the crash in #5320, but it doesn't fix the underlying
reason for the lost device (which is probably a driver bug.)
Previously, mpv would attempt to use a BGRA swapchain in the hope that
it would give better performance, since the Windows desktop is also
composited in BGRA. In practice, it seems like there is no noticable
performance difference between RGBA and BGRA swapchains and BGRA
swapchains cause trouble with a42b8b1142, which attempts to use the
swapchain format for intermediate FBOs, even though D3D11 does not
guarantee BGRA surfaces will work with UAV typed stores.
Uses the EGL width/height by default when the user fails to set
the android-surface-width/android-surface-height options.
This means the vo-resize command is optional, and does not need to
be implemented on android devices which do not support rotation.
Signed-off-by: Aman Gupta <aman@tmm1.net>
Apparently some Intel drivers have a bug where copying from staging
buffers to constant buffers does not work. We used to keep a copy of the
buffer data in a staging buffer to enable partial constant buffer
updates. To work around this bug, keep the copy in talloc-allocated
system memory instead.
There doesn't seem to be any noticable performance difference from
keeping the copy in system memory. Our cbuffers are probably too small
for it to matter anyway.
See also: https://crbug.com/593024Fixes#5293
This means that we now explicitly set an interval of 1. Although that
should be the EGL default, some drivers could possibly ignore this
(unconfirmed). In any case, this commit also allows disabling vsync, for
users who want it.
The queue family index and the queue info index are not necessarily the
same, so we're forced to do a check based on the queue family index
itself.
Fixes#5049
A vulkan validation layer update pointed out that this was wrong; we
still need to use the access type corresponding to the stage mask, even
if it means our code won't be able to skip the pipeline barrier (which
would be wrong anyway).
In additiona to this, we're also not allowed to specify any source
access mask when transitioning from top_of_pipe, which doesn't make any
sense anyway.
Async compute in particular seems to cause problems on some drivers, and
even when supprted the benefits are not that massive from the tests I
have seen, so it's probably safe to keep off by default.
Async transfer on the other hand seems to work better and offers a more
substantial improvement, so it's kept on.
This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to
situations where "more specialized" is ambiguous and the logic breaks
down. So to fix it, only compare the subset we care about.
blit() implies scaling, copy() is the equivalent command to use when the
formats are compatible (same pixel size) and the rects have the same
dimensions.
This allows RAs with support for non-opaque FBO formats to use a more
appropriate FBO format for the output tex, possibly enabling a more
efficient blit operation.
This requires distinguishing between real formats (which can be used to
create textures) and fake formats (e.g. ra_gl's FBO hack).
On AMD devices, we only get one graphics pipe but several compute pipes
which can (in theory) run independently. As such, we should prefer
compute shaders over fragment shaders in scenarios where we expect them
to be better for parallelism.
This is amusingly trivial to do, and actually improves performance even
in a single-queue scenario.
Instead of using a single primary queue, we generate multiple
vk_cmdpools and pick the right one dynamically based on the intent.
This has a number of immediate benefits:
1. We can use async texture uploads
2. We can use the DMA engine for buffer updates
3. We can benefit from async compute on AMD GPUs
Unfortunately, the major downside is that due to the lack of QF
ownership tracking, we need to use CONCURRENT sharing for all resources
(buffers *and* images!). In theory, we could try figuring out a way to
get rid of the concurrent sharing for buffers (which is only needed for
compute shader UBOs), but even so, the concurrent sharing mode doesn't
really seem to have a significant impact over here (nvidia). It's
possible that other platforms may disagree.
Our deadlock-avoidance strategy is stupidly simple: Just flush the
command every time we need to switch queues, and make sure all
submission and callbacks happen in FIFO order. This required lifting the
cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some
functions died/got moved as a result, but that's a relatively minor
change.
On my hardware this is a fairly significant performance boost, mainly
due to async transfers. (Nvidia doesn't expose separate compute queues
anyway). On AMD, this should be a performance boost as well due to async
compute.
This is especially interesting for vulkan since it allows completely
skipping the layout transition as part of the renderpass. Unfortunately,
that also means it needs to be put into renderpass_params, as opposed to
renderpass_run_params (unlike #4777).
Closes#4777.
This uses the new vk_signal mechanism to order all access to textures.
This has several advantageS:
1. It allows real synchronization of image access across multiple frames
when using multiple queues for parallelism.
2. It allows using events instead of pipeline barriers, which is a
finer-grained synchronization primitive that allows for more
efficient layout transitions over longer durations.
This commit also restructures some of the implicit transition code for
renderpasses to be more flexible and correct. (Note: this technically
drops the ability to transition the image out of undefined layout when
not blending, but that was a bug anyway and needs to be done properly)
vo_gpu: vulkan: remove no-longer-true optimization
The change to the output_tex format makes this no longer true, and it
actually seems to hurt performance now as well. So just don't do it
anymore. I also realized it hurts performance when drawing an OSD, so
it's probably not a good idea anyway.
This combines VkSemaphores and VkEvents into a common umbrella
abstraction which can resolve to either.
We aggressively try to prefer VkEvents over VkSemaphores whenever the
conditions are met (1. we can unsignal the semaphore, i.e. it comes from
the same frame; and 2. it comes from the same queue).
Instead of being submitted immediately, commands are appended into an
internal submission queue, and the actual submission is done once per
frame (at the same time as queue cycling). Again, the benefits are not
immediately obvious because nothing benefits from this yet, but it will
make more sense for an upcoming vk_signal mechanism.
This also cleans up the way the ra_vk submission interacts with the
synchronization/callbacks from the ra_vk_ctx. Although currently, the
way the dependency is signalled is a bit hacky: normally it would be
associated with the ra_tex itself and waited on in the appropriate stage
implicitly. But that code is just temporary, so I'm keeping it in there
for a better commit order.
Instead of associating a single VkSemaphore with every command buffer
and allowing the user to ad-hoc wait on it during submission, make the
raw semaphores-to-signal array work like the raw semaphores-to-wait-on
array. Doesn't really provide a clear benefit yet, but it's required for
upcoming modifications.
1. No more static arrays (deps / callbacks / queues / cmds)
2. Allows safely recording multiple commands at the same time
3. Uses resources optimally by never over-allocating commands
This hack was part of a solution to VSync judder in desktop OpenGL on
Windows. Rather than using blocking-SwapBuffers(), mpv could use
DwmFlush() to wait for the image to be presented by the compositor.
Since this would only work while the compositor was running, and the
compositor was silently disabled when OpenGL entered exclusive
fullscreen mode, mpv needed a way to detect exclusive fullscreen mode.
The code that is being removed could detect exclusive fullscreen mode by
checking the state of an undocumented mutex using undocumented native
API functions, but because of how fragile it was, it was always meant to
be removed when a better solution for accurate VSync in OpenGL was
found. Since then, mpv got the dxinterop backend, which uses desktop
OpenGL but has accurate VSync. It also got a native Direct3D 11 backend,
which is a viable alternative to OpenGL on Windows.
For people who are still using desktop OpenGL with WGL, there shouldn't
be much of a difference, since mpv can use other API functions to detect
exclusive fullscreen.
Refactored and split the `reinit_window_state` code into four
separate functions:
- `update_window_style` used to update window styles without
modifying the window rect.
- `fit_window_on_screen` used to adjust the window size when it is
larger than the screen size. Added a helper function `fit_rect` to
fit one rect on another without using any data from w32 struct.
- `update_fullscreen_state` used to calculate the new fullscreen
state and adjust the window rect accordingly.
- `update_window_state` used to display the window on screen with
new size, position and ontop state.
This commit fixes three issues:
- fixed#4753 by skipping `fit_window_on_screen` for a maximized
window, since maximized window should already fit on the screen.
It should be noted that this bug was only reproducible with
`--fit-border` option which is enabled by default. The cause of the
bug is that after calling the `add_window_borders` for a maximized
window, the rect in result is slightly larger than the screen rect,
which is okay, `SetWindowPos` will interpret it as a maximized state
later, so no auto-fitting to screen size is needed here.
- fixed#5215 by skipping `fit_window_on_screen` when leaving fullscreen.
On a multi-monitor system if the mpv window was stretched to cover
multiple monitors, its size was reset after switching back from
fullscreen to fit the size of the active monitor. Also, when changing
`--ontop` and `--border` options, now only the
`update_window_style` and `update_window_state` functions are used,
so `fit_window_on_screen` is not used for them too.
- fixed#2451 by moving the `ITaskbarList2_MarkFullscreenWindow`
below the `SetWindowPos`. If the taskbar is notified about fullscreen
state before the window is shown on screen, the taskbar button could
be missing until Alt-TAB is pressed, usually it was reproducible on
Windows 8.
Other changes:
- In `update_fullscreen_state` the `reset window bounds` debug
message now reports client area size and position, instead of window area
size and position. This is done for consistency with debug messages
in handling fullscreen state above in this function, since they also print
window bounds of the client area.
- Refactored `gui_thread_reconfig`. Added a new window flag `fit_on_screen`
to fit the window on screen even when leaving fullscreen. This is needed
for the case when the new video opened while the window is still in the
fullscreen state.
- Moved parent and fullscreen state checks out from the WM_MOVING to
`snap_to_screen_edges` function for consistency with other functions.
There's no point in keeping these checks out of the function body.
When window and screen size and position are stored in RECT, it's
much easier to modify them using WinAPI functions.
Added two macros to get width and height of the rect.