Commit Graph

45966 Commits

Author SHA1 Message Date
Niklas Haas 97b1482d53 vo_gpu: vulkan: fix sharing mode on malloc'd buffers
Might explain some of the issues in multi-queue scenarios?
2017-12-25 00:47:53 +01:00
Niklas Haas 5abe0bd593 vo_gpu: vulkan: fix dummyPass creation
This violates vulkan spec
2017-12-25 00:47:53 +01:00
Niklas Haas 58e201e6bd vo_gpu: vulkan: fix the rgb565a1 names -> rgb5a1
This is 5 bits per channel, not 565
2017-12-25 00:47:53 +01:00
Niklas Haas 286d421666 vo_gpu: vulkan: allow disabling async tf/comp
Async compute in particular seems to cause problems on some drivers, and
even when supprted the benefits are not that massive from the tests I
have seen, so it's probably safe to keep off by default.

Async transfer on the other hand seems to work better and offers a more
substantial improvement, so it's kept on.
2017-12-25 00:47:53 +01:00
Niklas Haas a6aab5dfd6 vo_gpu: vulkan: refine queue family selection algorithm
This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to
situations where "more specialized" is ambiguous and the logic breaks
down. So to fix it, only compare the subset we care about.
2017-12-25 00:47:53 +01:00
Niklas Haas 2d1769a534 vo_gpu: vulkan: prefer vkCmdCopyImage over vkCmdBlitImage
blit() implies scaling, copy() is the equivalent command to use when the
formats are compatible (same pixel size) and the rects have the same
dimensions.
2017-12-25 00:47:53 +01:00
Niklas Haas a42b8b1142 vo_gpu: attempt re-using the FBO format for p->output_tex
This allows RAs with support for non-opaque FBO formats to use a more
appropriate FBO format for the output tex, possibly enabling a more
efficient blit operation.

This requires distinguishing between real formats (which can be used to
create textures) and fake formats (e.g. ra_gl's FBO hack).
2017-12-25 00:47:53 +01:00
Niklas Haas 80540be211 vo_gpu: vulkan: properly depend on the swapchain acquire semaphore
This is now associated with the ra_tex directly and used in the correct
way, rather than hackily done from submit_frame.
2017-12-25 00:47:53 +01:00
Niklas Haas b138bdc01c vo_gpu: vulkan: use correct access flag for present
This needs VK_ACCESS_MEMORY_READ_BIT (spec)
2017-12-25 00:47:53 +01:00
Niklas Haas 8b0a111c59 vo_gpu: vulkan: make the swapchain more robust
Now handles both VK_ERROR_OUT_OF_DATE_KHR and VK_SUBOPTIMAL_KHR for both
vkAcquireNextImageKHR and vkQueuePresentKHR in the correct way.
2017-12-25 00:47:53 +01:00
Niklas Haas dcda8bd36a vo_gpu: aggressively prefer async compute
On AMD devices, we only get one graphics pipe but several compute pipes
which can (in theory) run independently. As such, we should prefer
compute shaders over fragment shaders in scenarios where we expect them
to be better for parallelism.

This is amusingly trivial to do, and actually improves performance even
in a single-queue scenario.
2017-12-25 00:47:53 +01:00
Niklas Haas bded247fb5 vo_gpu: vulkan: support split command pools
Instead of using a single primary queue, we generate multiple
vk_cmdpools and pick the right one dynamically based on the intent.
This has a number of immediate benefits:

1. We can use async texture uploads
2. We can use the DMA engine for buffer updates
3. We can benefit from async compute on AMD GPUs

Unfortunately, the major downside is that due to the lack of QF
ownership tracking, we need to use CONCURRENT sharing for all resources
(buffers *and* images!). In theory, we could try figuring out a way to
get rid of the concurrent sharing for buffers (which is only needed for
compute shader UBOs), but even so, the concurrent sharing mode doesn't
really seem to have a significant impact over here (nvidia). It's
possible that other platforms may disagree.

Our deadlock-avoidance strategy is stupidly simple: Just flush the
command every time we need to switch queues, and make sure all
submission and callbacks happen in FIFO order. This required lifting the
cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some
functions died/got moved as a result, but that's a relatively minor
change.

On my hardware this is a fairly significant performance boost, mainly
due to async transfers. (Nvidia doesn't expose separate compute queues
anyway). On AMD, this should be a performance boost as well due to async
compute.
2017-12-25 00:47:53 +01:00
Niklas Haas a3c9685257 vo_gpu: invalidate fbotex before drawing
Don't discard the OSD or pass_draw_to_screen passes though. Could be
faster on some hardware.
2017-12-25 00:47:53 +01:00
Niklas Haas 6186cc79e6 vo_gpu: allow invalidating FBO in renderpass_run
This is especially interesting for vulkan since it allows completely
skipping the layout transition as part of the renderpass. Unfortunately,
that also means it needs to be put into renderpass_params, as opposed to
renderpass_run_params (unlike #4777).

Closes #4777.
2017-12-25 00:47:53 +01:00
Niklas Haas fb1c7bde42 vo_gpu: vulkan: properly track image dependencies
This uses the new vk_signal mechanism to order all access to textures.
This has several advantageS:

1. It allows real synchronization of image access across multiple frames
   when using multiple queues for parallelism.

2. It allows using events instead of pipeline barriers, which is a
   finer-grained synchronization primitive that allows for more
   efficient layout transitions over longer durations.

This commit also restructures some of the implicit transition code for
renderpasses to be more flexible and correct. (Note: this technically
drops the ability to transition the image out of undefined layout when
not blending, but that was a bug anyway and needs to be done properly)

vo_gpu: vulkan: remove no-longer-true optimization

The change to the output_tex format makes this no longer true, and it
actually seems to hurt performance now as well. So just don't do it
anymore. I also realized it hurts performance when drawing an OSD, so
it's probably not a good idea anyway.
2017-12-25 00:47:53 +01:00
Niklas Haas f2f91cf570 vo_gpu: vulkan: add a vk_signal abstraction
This combines VkSemaphores and VkEvents into a common umbrella
abstraction which can resolve to either.

We aggressively try to prefer VkEvents over VkSemaphores whenever the
conditions are met (1. we can unsignal the semaphore, i.e. it comes from
the same frame; and 2. it comes from the same queue).
2017-12-25 00:47:53 +01:00
Niklas Haas 5feaaba0fd vo_gpu: vulkan: refactor command submission
Instead of being submitted immediately, commands are appended into an
internal submission queue, and the actual submission is done once per
frame (at the same time as queue cycling). Again, the benefits are not
immediately obvious because nothing benefits from this yet, but it will
make more sense for an upcoming vk_signal mechanism.

This also cleans up the way the ra_vk submission interacts with the
synchronization/callbacks from the ra_vk_ctx. Although currently, the
way the dependency is signalled is a bit hacky: normally it would be
associated with the ra_tex itself and waited on in the appropriate stage
implicitly. But that code is just temporary, so I'm keeping it in there
for a better commit order.
2017-12-25 00:47:53 +01:00
Niklas Haas 885497a445 vo_gpu: vulkan: reorganize vk_cmd slightly
Instead of associating a single VkSemaphore with every command buffer
and allowing the user to ad-hoc wait on it during submission, make the
raw semaphores-to-signal array work like the raw semaphores-to-wait-on
array. Doesn't really provide a clear benefit yet, but it's required for
upcoming modifications.
2017-12-25 00:47:53 +01:00
Niklas Haas 4e34615872 vo_gpu: vulkan: refactor vk_cmdpool
1. No more static arrays (deps / callbacks / queues / cmds)
2. Allows safely recording multiple commands at the same time
3. Uses resources optimally by never over-allocating commands
2017-12-25 00:47:53 +01:00
Martin Herkt ad50e640dc
Update VERSION 2017-12-25 00:23:13 +01:00
Martin Herkt dfac83a81d
Release 0.28.0 2017-12-25 00:18:05 +01:00
Ricardo Constantino b1b03da137 ytdl_hook: use table concat for playlist building
Faster and more efficient than string concat with large playlists.
2017-12-24 14:13:57 -07:00
Ricardo Constantino 1623430b20 ytdl_hook: don't preappend ytdl:// to non-youtube links in playlists
Close #5003
2017-12-24 14:13:57 -07:00
wm4 29af787217 player: update duration based on highest timestamp demuxed
This will help with things like livestreams.

As a minor detail, subtitles are excluded, because they sometimes have
"unused" events after video and audio ends. To avoid this annoying
corner case, just ignore them.
2017-12-24 21:49:12 +01:00
wm4 c12d897a3a player: allow seeking in cached parts of unseekable streams
Before this change and before the seekable stream cache became a thing,
we could possibly seek using the stream cache. But we couldn't know
whether the seek would succeed. We knew the available byte range, but
could in general not tell whether a demuxer would stay within the range
when trying to seek to a specific time position. We preferred to have
safe defaults, so seeking in streams that were detected as unseekable
were not honored. We allowed overriding this via --force-seekable=yes,
in which case it depended on your luck whether the seek would work, or
the player crapped its pants.

With the demuxer packet cache, we can tell exactly whether a seek will
work (at least if there's only 1 seek range). We can just let seeks go
through. Everything to allow this is already in place, and this commit
just moves around some minor things.

Note that the demux_seek() return value was not used before, because low
level (i.e. network level) seeks are usually asynchronous, and if they
fail, the state is pretty much undefined. We simply repurpose the return
value to signal whether cache seeking worked. If it didn't, we can just
resume playback normally, because demuxing continues unaffected, and no
decoder are reset.

This should be particularly helpful to people who for some reason stream
data into stdin via streamlink and such.
2017-12-24 21:45:12 +01:00
wm4 bf111f9c3c stream_libarchive: fix seeking fallback
In commit 1199c1e3, we added checks to every libarchive API call to make
sure the archive was closed on ARCHIVE_FATAL - otherwise, libarchive
could endow us with free CVEs (such as it apparently happens when you
continue reading a rar archive that uses features not yet supported by
libarchive).

This broke the fallback for seeking in unseekable archive formats. Of
course libarchive won't tell us directly whether a format implementation
has seek support or not - and OF COURSE it returns ARCHIVE_FATAL if it
has no seek support. (The error string, which you can retrieve via API,
is actually more detailed, and also claims it's an "internal error". I
don't think so, libarchive.) Returning ARCHIVE_FATAL means we have to
assume free CVEs are ahead, and we have to close the archive. Which
breaks the fallback in a dumb way (we have no way of telling which of
those cases happened anyway).

Fix this by assuming that all seek errors are potentially due to lack of
seek support. If the seek call fails, reopen the archive, and set a flag
so the seek API is never tried again. (This means we can still skip
ahead for forward seeks, which is more efficient than skipping from the
start of the archive entry.)

Also fix an old typo in an error message.
2017-12-24 21:33:16 +01:00
wm4 2a43060560 cache: propagate underlying stream seek errors in some cases
This just put the cache into an endless loop. This can happen simply if
any seek call of the underlying stream returns an error.
2017-12-24 21:33:16 +01:00
Martin Herkt 5ffbe2ba8a
command: use IEC symbols for file size formatting 2017-12-24 21:27:41 +01:00
Martin Herkt 79b9f48353
wscript: support static linking of shaderc
These idiots have no idea how to design a library, so we’ll have
to work around their bullshit for now.
2017-12-24 20:45:45 +01:00
wm4 a721dac601 player: make track language matching case insensitive
There is no reason not to do this, and probably saves the user some
trouble.

Mostly untested.

Closes #5272.
2017-12-23 15:14:13 -07:00
wm4 30686dcec3 demux_mkv: fix off by one error
Caused by the relatively recent change to packet parsing. This time it
was probably triggered by lace type 0, which reduces the byte length of
a 0 sized packet to 3 (timestamp + flag) instead of 4 (lace count for
other lace types). The thing about laces is just my guess why it worked
for other 0 sized packets, though.

Also remove the redundant and now incorrect check below.

Fixes #5271.
2017-12-23 14:07:21 -07:00
wm4 2c3a172ef1 demux: note refresh state separately in debug output
This log line tells us why the demuxer is trying to read more, which us
useful when debugging queue overflows. Probably barely useful, but I
think keeping that flag separately also makes the code slightly easier
to understand.
2017-12-23 00:32:59 +01:00
wm4 62cf960ef0 osc: show demuxer cache buffered amount in bytes
Same as previous commit, but for the OSC.

(A bit of a waste to request demuxer-cache-state at least twice per
frame, but the OSC queries so many properties it probably doesn't matter
anymore.)
2017-12-23 00:32:59 +01:00
wm4 822b247d10 player: show demuxer cache buffered amount in bytes in the status line
I don't want to add another field to display stream and demuxer cache
separately, so just add them up. This strangely makes sense, since the
forward buffered stream cache amount consists of data not read by the
demuxer yet. (If the demuxer cache has buffered the full stream, the
forward buffered stream cache amount is 0.)
2017-12-23 00:32:59 +01:00
wm4 a23a98f648 cache: lower default size to 2*10MB
Reduce it from 75MB in both directions (forward/backwards) to 10MB each.

The stream cache is kind of becoming useless in favor of the demuxer
cache. Using both doesn't make much sense, because they will contain
duplicated data for no reason.

Still leave it at 10MB, which may help with mp4 a bit. libavformat's mp4
demuxer tends to seek too much, so we try to avoid triggering network
level seeks by having some caching in the stream layer.
2017-12-23 00:32:59 +01:00
wm4 382a8ac0b0 demux: bump the demuxer cache readahead duration
Set it to 10 hours, which is practically unlimited. (Avoiding use of
"inf", since that might interact strangely with the option parser and
such.)
2017-12-23 00:32:59 +01:00
wm4 cc8759a855 demux: always discard cached packets on track switches
This fixes weird behavior in the following case:

- open a file
- make sure the max. demuxer forward cache is smaller than the
  file's video track
- make sure the max. readahead duration is larger than the file's
  duration
- disable the audio track
- seek to the beginning of the file
- once the cache has filled enable the audio track
- a queue overflow warning should appear
(- looking at the seek ranges is also interesting)

The queue overflow warning happens because the packed queue for the
video track will use up the full quota set by --demuxer-max-bytes. When
the audio track is enabled, reading an audio packet would technically
overflow the packet cache by the size of whatever packet is read next.

This means the demuxer signals EOF to the decoder, and once playback has
consumed enough video packets so that audio packets can be read again,
the decoder resumes from EOF. This interacts badly with A/V
synchronization and the whole thing can randomly crap itself until audio
has fully recovered.

We didn't care about this so far, but we want to raise the readahead
duration to something very high, so that the demuxer cache is fully
used. This means this case can be hit quite quickly by switching audio
or subtitle tracks, and is not really an obscure corner case anymore.

Fix this by always losing all cache. Since the cache can't be used
anyway until the newly selected track has been read, this is not much of
a disadvantage. The only thing that could be brought up is that
unselecting the track again could resume operation normally. (Maybe this
would be useful if network died completely without chance of recovery.
Then you could watch the already buffered video anyway by deselecting
the audio track again.) But given the headaches, this seems like the
better solution.

Unfortunately this requires adding new new strange fields and strangely
fragmenting state management functions again. I'm sure whoever works on
this in the future will hate me. Currently it seems like the lesser
evil, and much simpler and robust than the other potential solutions.

In case this needs to be revisited, here is a reminder for readers from
the future what alternative solutions were considered, without those
disadvantages:

A first attempted solution allowed the demuxer to buffer some additional
packets on track switching. This would allow it to read enough data to
feed the decoder at least. But it was still awkward, as it didn't allow
the demuxer to continue prefetching the newly selected track. It also
barely worked, because you could make the forward buffer "over full" by
seeking back with seekable cache enabled, and then it couldn't read
packets anyway.

As alternative solution, we could always demux and cache all tracks,
even if they're deselected. This would also not require a network-level
seek for the "refresh" logic (it's the thing that lets the video decoder
continue as if nothing happened, while actually seeking back in the
stream to get the missing audio packets, in the case of enabling a
previously disabled audio track). But it would also possibly waste
network and memory resources, depending on what the user actually wants.

A second solution would just account the queue sizes for each stream
separately. We could freely fill up the audio packet queue, even if the
video queue is full. Since the demuxer API returns interleaved packets
and doesn't let you predict which packet type comes next, this is not as
simple as it sounds, but it'd probably tie in nicely with the "refresh"
logic.

A third solution would be removing buffered video packets from the end
of the packet queue. Since the "refresh" logic gets these anyway, there
is no reason to keep them if they prevent the audio packet queue from
catching up with the video one. But this would require additional logic,
would interact badly with a bunch of other corner cases. And as far as
the code goes, it's rather complex, because all the logic is written
with FIFO behavior in mind (including the fact that the packet queue is
a singly linked list with no backwards links, making removal from the
end harder).
2017-12-23 00:32:59 +01:00
Ricardo Constantino 08dbc8f43e travis: stop excluding ffmpeg-git
Signed-off-by: Stefano Pigozzi <stefano.pigozzi@gmail.com>
2017-12-22 21:49:21 +01:00
wm4 2964788055
options: deprecate --ff- options and properties
Some old crap which nobody needs and which probably nobody uses.

This relies on a GCC extension: using "## __VA_ARGS__" to remove the
comma from the argument list if the va args are empty. It's supported
by clang, and there's some chance newer standards will introduce a
proper way to do this. (Even if it breaks somewhere, it will be a
problem only for 1 release, since I want to drop the deprecated
properties immediately.)
2017-12-21 19:51:30 +01:00
Scott Zeid 688768b9bd build: use a list instead of a string for rpi cflags
Using a string caused all 4 flags to be passed as one argument, causing
the build to fail when trying to include bcm_host.h.
2017-12-21 19:47:54 +01:00
wm4 3412c1a1aa
Restore Libav support
Libav has been broken due to the hwdec changes. This was always a
temporary situation (depended on pending patches to be merged), although
it took a bit longer. This also restores the travis config.

One code change is needed in vd_lavc.c, because it checks the AV_PIX_FMT
for videotoolbox (as opposed to the mpv format identifier), which is not
available in Libav. Add an ifdef; the affected code is for a deprecated
option anyway.
2017-12-21 19:45:32 +01:00
wm4 2ce7face96
hwdec: remove unused fields
These were replaced by a different mechanism, but the old fields weren't
removed.
2017-12-21 19:31:36 +01:00
Aman Gupta 7e2252688b vo_mediacodec_embed: implement hwcontext
Fixes vo_mediacodec_embed, which was broken in 80359c6615
2017-12-20 15:45:55 +11:00
wm4 eb619d0f57 command: make video-frame-info property observable
Pointed out as missing by someone. Not terribly useful, but here we go.
2017-12-20 15:28:23 +11:00
James Ross-Gowan 3d8ca93d23 vo_gpu: win: remove exclusive-fullscreen detection hack
This hack was part of a solution to VSync judder in desktop OpenGL on
Windows. Rather than using blocking-SwapBuffers(), mpv could use
DwmFlush() to wait for the image to be presented by the compositor.
Since this would only work while the compositor was running, and the
compositor was silently disabled when OpenGL entered exclusive
fullscreen mode, mpv needed a way to detect exclusive fullscreen mode.

The code that is being removed could detect exclusive fullscreen mode by
checking the state of an undocumented mutex using undocumented native
API functions, but because of how fragile it was, it was always meant to
be removed when a better solution for accurate VSync in OpenGL was
found. Since then, mpv got the dxinterop backend, which uses desktop
OpenGL but has accurate VSync. It also got a native Direct3D 11 backend,
which is a viable alternative to OpenGL on Windows.

For people who are still using desktop OpenGL with WGL, there shouldn't
be much of a difference, since mpv can use other API functions to detect
exclusive fullscreen.
2017-12-20 14:53:41 +11:00
pavelxdd d13f9d0886 w32_common: refactor and improve window state handling
Refactored and split the `reinit_window_state` code into four
separate functions:
- `update_window_style` used to update window styles without
modifying the window rect.
- `fit_window_on_screen` used to adjust the window size when it is
larger than the screen size. Added a helper function `fit_rect` to
fit one rect on another without using any data from w32 struct.
- `update_fullscreen_state` used to calculate the new fullscreen
state and adjust the window rect accordingly.
- `update_window_state` used to display the window on screen with
new size, position and ontop state.

This commit fixes three issues:
- fixed #4753 by skipping `fit_window_on_screen` for a maximized
window, since maximized window should already fit on the screen.
It should be noted that this bug was only reproducible with
`--fit-border` option which is enabled by default. The cause of the
bug is that after calling the `add_window_borders` for a maximized
window, the rect in result is slightly larger than the screen rect,
which is okay, `SetWindowPos` will interpret it as a maximized state
later, so no auto-fitting to screen size is needed here.
- fixed #5215 by skipping `fit_window_on_screen` when leaving fullscreen.
On a multi-monitor system if the mpv window was stretched to cover
multiple monitors, its size was reset after switching back from
fullscreen to fit the size of the active monitor. Also, when changing
`--ontop` and `--border` options, now only the
`update_window_style` and `update_window_state` functions are used,
so `fit_window_on_screen` is not used for them too.
- fixed #2451 by moving the `ITaskbarList2_MarkFullscreenWindow`
below the `SetWindowPos`. If the taskbar is notified about fullscreen
state before the window is shown on screen, the taskbar button could
be missing until Alt-TAB is pressed, usually it was reproducible on
Windows 8.

Other changes:
- In `update_fullscreen_state` the `reset window bounds` debug
message now reports client area size and position, instead of window area
size and position. This is done for consistency with debug messages
in handling fullscreen state above in this function, since they also print
window bounds of the client area.
- Refactored `gui_thread_reconfig`. Added a new window flag `fit_on_screen`
to fit the window on screen even when leaving fullscreen. This is needed
for the case when the new video opened while the window is still in the
fullscreen state.
- Moved parent and fullscreen state checks out from the WM_MOVING to
`snap_to_screen_edges` function for consistency with other functions.
There's no point in keeping these checks out of the function body.
2017-12-19 23:22:52 +11:00
pavelxdd ebd5ae3721 w32_common: use RECT for storing screen and window size & position
When window and screen size and position are stored in RECT, it's
much easier to modify them using WinAPI functions.
Added two macros to get width and height of the rect.
2017-12-19 23:22:52 +11:00
wm4 d690ee0959 client API: change --stop-playback-on-init-failure default
This was off for mpv CLI, but on for libmpv. The motivation behind this
was that it would be confusing for applications if libmpv continued
playback in a severely "degraded" way (without either audio or video),
and that it would be better to fail early.

In reality the behavior was just a confusing difference to mpv CLI, and
has confused actual users as well. Get rid of it.

Not bothering with a version bump, since this is so minor, and it's easy
to ensure compatibility in affected applications by just setting the
option explicitly.

(Also adding the missing next-release-marker in client-api-changes.rst.)
2017-12-17 15:45:24 -08:00
wm4 9ed8ca2529 vo_gpu: hwdec_drmprime_drm: don't crash for non-GL contexts
Using vulkan with --hwdec crashed because of this.
2017-12-17 11:00:51 -08:00
rim b0d0bc5700 dvb: Fix long channel switching: next/prev channel 2017-12-16 23:24:55 -08:00