1
0
mirror of https://github.com/mpv-player/mpv synced 2025-01-02 04:42:10 +00:00
Commit Graph

45759 Commits

Author SHA1 Message Date
wm4
1b0dc7d169 demux: use seekable cache for network by default, bump prefetch limit
The option for enabling it has now an "auto" choice, which is the
default, and which will enable it if the media is thought to be via
network or if the stream cache is enabled (same logic as --cache-secs).

Also bump the --cache-secs default from 10 to 120.
2017-11-10 16:30:43 +01:00
wm4
618b8a33e5 demux_mkv: fix potential uninitialized variable read 2017-11-10 12:49:53 +01:00
wm4
6bcdcaeeea demux: set default back buffer to some high value
Some back buffer is required to make the immediate forward range
seekable. This is because the back buffer limit is strictly enforced.
Just set a rather high back buffer by default. It's not use if
--demuxer-seekable-cache is disabled, so this is without risk.
2017-11-10 12:37:19 +01:00
wm4
b0de1ac36c demux: limit number of seek ranges to a static maximum
Limit the number of cached ranges to MAX_SEEK_RANGES, which is the same
number of maximally exported seek ranges. It makes no sense to keep
them, as the user won't see them anyway. Remove the smallest ones to
enforce the limit if the number grows too high.
2017-11-10 12:32:40 +01:00
wm4
c8bb78bad8 demux: speed up cache seeking with a coarse index
Helps a little bit, I guess. But in general, t(h)rashing the cache kills
us anyway.

This has a fixed number of index entries. Entries are added/removed as
packets go through the packet queue. Only keyframes after index_distance
seconds are added. If there are too many keyframe packets, the existing
index is reduced by half, and index_distance is doubled. This should
provide somewhat even spacing between the entries.
2017-11-10 12:17:34 +01:00
wm4
2485b899c3 demux: avoid wasting time by stopping packet search as early as possible
The packet queue is sorted, so we can stop the search if we have found a
packet, and the next packet in the queue has a higher PTS than the seek
PTS (for the sake of SEEK_FORWARD, we still consider the first packet
with a higher PTS).

Also, as a mostly cosmetic change, but which might be "faster", check
target for NULL, instead of target_diff for a magic float value.
2017-11-10 12:11:33 +01:00
wm4
968a24772e demux: simplify remove_packet() function
Turns out this is only ever used to remove the head element anyway.
2017-11-10 11:35:19 +01:00
wm4
65d36013dd demux: fix failure to join ranges with subtitles in some cases
Subtitle streams are sparse, and no overlap is required to correctly
join two cached ranges. This was not correctly handled iff the two
ranges had different subtitle packets close to the join point.
2017-11-10 11:06:07 +01:00
wm4
4681b4b28b demux: reverse which range is reused when joining them
Which one to use doesn't really matter, but reusing the first one will
probably be slightly more convenient later on.
2017-11-10 10:46:54 +01:00
wm4
f123cc4c9b demux: fix a race condition with async seeking
demux_add_packet() must completely ignore any packets that are added
while a queued seek is not initiated yet.

The main issue is that after setting in->seeking==true, the central lock
is released, and it can take "a while" until it's reacquired on the
demux thread and the seek is actually initiated. During that time,
packets could be read and added, that have nothing to do with the new
state.
2017-11-10 10:23:49 +01:00
wm4
d3bc93cf2e demux: get rid of an unnecessary field 2017-11-10 10:15:37 +01:00
wm4
4a6b04bdb9 vo_gpu: never pass flipped images to ra or ra backends
Seems like the last refactor to this code broke playing flipped images,
at least with --opengl-pbo --gpu-api=opengl.

Flipping is rather a shitmess. The main problem is that OpenGL does not
support flipped uploading. The original vo_gl implementation considered
it important to handle the flipped case efficiently, so instead of
uploading the image line by line backwards, it uploaded it flipped, and
then flipped it in the renderer (basically the upload path ignored the
flipping). The ra code and backends probably have an insane and
inconsistent mix of semantics, so fix this by never passing it flipped
images in the first place.

In the future, the backends should probably support flipped images
directly.

Fixes #5097.
2017-11-10 10:06:33 +01:00
wm4
ed1af99727 demux: attempt to accurately reflect seek range with muxed subtitles
If subtitles are part of the stream, determining the seekable range
becomes harder. Subtitles are sparse, and can have packets in irregular
intervals, or even completely lack packets. The usual logic of computing
the seek range by the min/max packet timestamps fails.

Solve this by making the only assumption we can make: subtitle packets
are implicitly demuxed along with other packets. We also assume perfect
interleaving for this, but you really can't do anything with sparse
packets that makes sense without this assumption.

One special case is if we prune sparse packets within the current
seekable range. Obviously this should limit the seekable range to after
the pruned packet.
2017-11-10 08:57:37 +01:00
wm4
7c1e7468e6 demux: reduce indentation for two functions
Remove the single-exit, that added a huge if statement containing
everything, just for some corner case.
2017-11-10 04:41:56 +01:00
wm4
6493ef2e34 demux: some minor mostly cosmetics
None of these change functionality in any way (although the log level
changes obviously change the terminal output).
2017-11-10 04:36:50 +01:00
wm4
a0b6f805f9 demux: simplify a function
update_stream_selection_state() doesn't need all these arguments. Not
sure what I was thinking here.
2017-11-10 04:08:15 +01:00
wm4
8c5ef33044 demux: change how refreshes on track switching are handled
Instead of weirdly deciding this on every packet read and having the
code far away from where it's actually needed, just run it where it's
actually needed.
2017-11-10 03:56:24 +01:00
wm4
25ed7ff0c8 demux: get rid of weird backwards loop
A typical idiom for calling functions that remove something from the
array being iterated, but it's not needed here. I have no idea why this
was ever done.
2017-11-10 03:26:40 +01:00
wm4
9c330b53e3 demux: avoid broken readahead when joining ranges
Setting ds->refreshing for unselected streams could lead to a
nonsensical queue overflow warning, because read_packet() took it as a
reason to continue reading.

Also add some more information to the queue overflow warning (even if
that one doesn't have anything to do with this bug - it was for
unselected streams only).
2017-11-10 03:19:25 +01:00
wm4
c494049e76 demux: reduce difference between threaded and non-threaded mode
This fixes an endless loop with threading disabled, such as for example
when playing a file with an external subtitle file, and seeking to the
beginning. Something will set in->seeking, but the seek is never
executed, which made demux_read_packet() loop endlessly. (External
subtitles will use non-threaded mode for whatever reasons.)

Fix this by by making the unthreaded code to execute the worker thread
body, which reduces the difference in logic.
2017-11-10 02:55:02 +01:00
wm4
935e406d63 demux: support multiple seekable cached ranges
Until now, the demuxer cache was limited to a single range. Extend this
to multiple range. Should be useful for slow network streams.

This commit changes a lot in the internal demuxer cache logic, so
there's a lot of room for bugs and regressions. The logic without
demuxer cache is mostly untouched, but also involved with the code
changes. Or in other words, this commit probably fucks up shit.

There are two things which makes multiple cached ranges rather hard:

1. the need to resume the demuxer at the end of a cached range when
   seeking to it
2. joining two adjacent ranges when the lowe range "grows" into it (and
   resuming the demuxer at the end of the new joined range)

"Resuming" the demuxer means that we perform a low level seek to the end
of a cached range, and properly append new packets to it, without adding
packets multiple times or creating holes due to missing packets.

Since audio and video never line up exactly, there is no clean "cut"
possible, at which you could resume the demuxer cleanly (for 1.) or
which you could use to detect that two ranges are perfectly adjacent
(for 2.). The way how the demuxer interleaves multiple streams is also
unpredictable. Typically you will have to expect that it randomly allows
one of the streams to be ahead by a bit, and so on.

To deal with this, we have heuristics in place to detect when one packet
equals or is "behind" a packet that was demuxed earlier. We reuse the
refresh seek logic (used to "reread" packets into the demuxer cache when
enabling a track), which checks for certain packet invariants.
Currently, it observes whether either the raw packet position, or the
packet DTS is strictly monotonically increasing. If none of them are
true, we discard old ranges when creating a new one.

This heavily depends on the file format and the demuxer behavior. For
example, not all file formats have DTS, and the packet position can be
unset due to libavformat not always setting it (e.g. when parsers are
used).

At the same time, we must deal with all the complicated state used to
track prefetching and seek ranges. In some complicated corner cases, we
just give up and discard other seek ranges, even if the previously
mentioned packet invariants are fulfilled.

To handle joining, we're being particularly dumb, and require a small
overlap to be confident that two ranges join perfectly. (This could be
done incrementally with as little overlap as 1 packet, but corner cases
would eat us: each stream needs to be joined separately, and the cache
pruning logic could remove overlapping packets for other streams again.)

Another restriction is that switching the cached range will always
trigger an asynchronous low level seek to resume demuxing at the new
range. Some users might find this annoying.

Dealing with interleaved subtitles is not fully handled yet. It will
clamp the seekable range to where subtitle packets are.
2017-11-09 10:23:57 +01:00
James Ross-Gowan
bd4ec8e4e1 appveyor: update ffmpeg and test d3d11/vulkan
Build ffmpeg-mpv, shaderc and crossc from source, since they are not
packaged in MSYS2. Also, add some more explicit --enable flags to the
mpv build to make sure things like D3D11, D3D11VA hwaccels and Vulkan
are auto-detected.
2017-11-08 07:22:54 +11:00
James Ross-Gowan
e7bf5576e5 vo_gpu: hwdec_d3d11va: allow zero-copy video decoding
Like the manual says, this is technically undefined behaviour. See:
https://msdn.microsoft.com/en-us/library/windows/desktop/ff476085.aspx

In particular, MSDN says texture arrays created with the BIND_DECODER
flag cannot be used with CreateShaderResourceView, which means they
can't be sampled through SRVs like normal Direct3D textures. However,
some programs (Google Chrome included) do this anyway for performance
and power-usage reasons, and it appears to work with most drivers.

Older AMD drivers had a "bug" with zero-copy decoding, but this appears
to have been fixed. See #3255, #3464 and http://crbug.com/623029.
2017-11-07 20:27:13 +11:00
James Ross-Gowan
b258d82d6e vo_gpu: d3d11: enhance cache invalidation
The shader cache in ra_d3d11 caches the result of shaderc, crossc and
the D3DCompiler DLL, so it should be invalidated when any of those
components are updated. This should make the cache more reliable, which
makes it safer to enable gpu-shader-cache-dir. Shader compilation is
slow with D3D11, so gpu-shader-cache-dir is highly necessary
2017-11-07 20:27:13 +11:00
James Ross-Gowan
b9c1286893 vo_gpu: d3d11: log shader compilation times
Some shaders take a _long_ time to compile with the Direct3D compiler.
The ANGLE backend had this problem too, to a certain extent. Logging
should help identify which shaders cause long stalls and could also help
with benchmarking ways of reducing compile times.
2017-11-07 20:27:13 +11:00
James Ross-Gowan
4b014b3a81 vo_gpu: move d3d11_screenshot to shared code
This can be used by the ANGLE backend and ra_d3d11.
2017-11-07 20:27:13 +11:00
James Ross-Gowan
9b2dae79b1 vo_gpu: d3d11: add RA caps for ra_d3d11
ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is
then translated to HLSL. This means it always exposes the same GLSL
version that the SPIR-V compiler supports (4.50 for shaderc/glslang.)

Despite claiming to support GLSL 4.50, some features that are tied to
the GLSL version in OpenGL are not supported by ra_d3d11 when targeting
legacy Direct3D feature levels.

This includes two features that mpv relies on:
- Reading from gl_FragCoord in the fragment shader (requires FL 10_0)
- textureGather from any texture component (requires FL 11_0)

These features have been exposed as new RA caps.
2017-11-07 20:27:13 +11:00
James Ross-Gowan
68eac1a1e7 vo_gpu: d3d11: initial implementation
This is a new RA/vo_gpu backend that uses Direct3D 11. The GLSL
generated by vo_gpu is cross-compiled to HLSL with SPIRV-Cross.

What works:

- All of mpv's internal shaders should work, including compute shaders.

- Some external shaders have been tested and work, including RAVU and
  adaptive-sharpen.

- Non-dumb mode works, even on very old hardware. Most features work at
  feature level 9_3 and all features work at feature level 10_0. Some
  features also work at feature level 9_1 and 9_2, but without high-bit-
  depth FBOs, it's not very useful. (Hardware this old is probably not
  fast enough for advanced features anyway.)

  Note: This is more compatible than ANGLE, which requires 9_3 to work
  at all (GLES 2.0,) and 10_1 for non-dumb-mode (GLES 3.0.)

- Hardware decoding with D3D11VA, including decoding of 10-bit formats
  without truncation to 8-bit.

What doesn't work / can be improved:

- PBO upload and direct rendering does not work yet. Direct rendering
  requires persistent-mapped PBOs because the decoder needs to be able
  to read data from images that have already been decoded and uploaded.
  Unfortunately, it seems like persistent-mapped PBOs are fundamentally
  incompatible with D3D11, which requires all resources to use driver-
  managed memory and requires memory to be unmapped (and hence pointers
  to be invalidated) when a resource is used in a draw or copy
  operation.

  However it might be possible to use D3D11's limited multithreading
  capabilities to emulate some features of PBOs, like asynchronous
  texture uploading.

- The blit() and clear() operations don't have equivalents in the D3D11
  API that handle all cases, so in most cases, they have to be emulated
  with a shader. This is currently done inside ra_d3d11, but ideally it
  would be done in generic code, so it can take advantage of mpv's
  shader generation utilities.

- SPIRV-Cross is used through a NIH C-compatible wrapper library, since
  it does not expose a C interface itself.

  The library is available here: https://github.com/rossy/crossc

- The D3D11 context could be made to support more modern DXGI features
  in future. For example, it should be possible to add support for
  high-bit-depth and HDR output with DXGI 1.5/1.6.
2017-11-07 20:27:13 +11:00
James Ross-Gowan
8020a62953 vo_gpu: export the GLSL format qualifier for ra_format
Backported from @haasn's change to libplacebo, except in the current RA,
there's nothing to indicate an ra_format can be bound as a storage
image, so there's no way to force all of these formats to have a
glsl_format. Instead, the layout qualifier will be removed if
glsl_format is NULL.

This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while
loading float values from unorm images often works as expected, it's
technically undefined behaviour, and in Windows 10, it will cause the
debug layer to spam the log with error messages. Also, apparently in
GLSL, the format name must match the image's format exactly (but in
Direct3D, it just has to have the same component type.)
2017-11-07 20:27:13 +11:00
James Ross-Gowan
41dff03f8d vo_gpu: add namespace query mechanism
Backported from @haasn's change to libplacebo. More flexible than the
previous "shared || non-shared" distinction. The extra flexibility is
needed for Direct3D 11, but it also doesn't hurt code-wise.
2017-11-07 20:27:13 +11:00
wm4
793b43020c vo_lavc: remove messy delayed subtitle rendering logic
For some reason vo_lavc's draw_image can buffer the frame and encode it
only later. Also, there is logic for rendering the OSD (i.e. subtitles)
only when needed.

In theory this can lead to subtitles being pruned before it tries to
render them (as the subtitle logic doesn't know that the VO still needs
them later), although this probably never happens in reality.

The worse issue, that actually happened, is that if the last frame gets
buffered, it attempts to render subtitles in the uninit callback. At
this point, the subtitle decoder is already torn down and all subtitles
removed, thus it will draw nothing. This didn't always happen. I'm not
sure why - potentially in the working cases, the frame wasn't buffered.

Since this logic doesn't have much worth, except a minor performance
advantage if frames with subtitles are dropped, just remove it.

Hopefully fixes #4689.
2017-11-07 05:29:26 +01:00
wm4
a2a623ebb9 player: change license of some code surrounding --frames to LGPL
The original author of the patch has agreed now.
2017-11-06 20:53:27 +01:00
wm4
921073bf86 img_format: remove some guards against old ffmpeg API
These are always present in ffmpeg-mpv, and never in Libav.
2017-11-06 17:14:01 +01:00
wm4
4ef0887f7b demux: explicitly discard 0 sized packets
libavcodec can't deal with them, because its API doesn't distinguish
between 0 sized packets and sending EOF. As such, keeping them doesn't
do any good, ever. This actually fixes some obscure mkv sample (see
previous commit).
2017-11-06 17:13:42 +01:00
wm4
e598b19dad demux_mkv: allow 0 sized packets
Fixes some obscure sample that uses fixed size laces with 0-sized lace
size. Some broken shit. (Maybe the decoder wouldn't care about these
packets, but the demuxer attempted to resync after these packet reading
errors, even though they were perfectly recoverable. But I don't care
enough about this.)

Sample link: https://samples.ffmpeg.org/Matroska/switzler084d_dl.mkv
2017-11-06 17:12:58 +01:00
wm4
7334d93b30 demux: slightly simplify pruning
We can compute the overhead size easily now - no need for awkward
incremental updates to cached values on top of it.
2017-11-06 17:11:31 +01:00
wm4
9e1fbffc37 demux_mkv: rewrite packet reading to avoid 1 memcpy()
This directly reads individual mkv sub-packets (block laces) into a
dedicated AVBufferRefs, which can be directly used for creating packets
without a additional copy of the packet data. This also means we switch
parsing of block header fields and lacing metadata to read directly from
the stream, instead of a memory buffer.

This could have been much easier if libavcodec didn't require padding
the packet data with zero bytes. We could just have each packet
reference a slice of the block data. But as it is, the only way to get
padding without a copy is to read the laces into individually allocated
(and padded) memory block, which required a larger rewrite.

This probably makes recovering from broken mkv files slightly worse if
the transport is unseekable. We just read, and then check if we've
overread. But I think that shouldn't be a real concern.

No actual measureable performance change. Potential for some
regressions, as this is quite intrusive, and touches weird obscure shit
like mkv lacing. Still keeping it because I like how it removes some
redundant EBML parsing functions.
2017-11-05 18:13:34 +01:00
wm4
8aa1db3b17 demux: refactoring in preparation for multiple seek range support
This adds a bunch of stuff (mostly unused or redundant) as preparation
for supporting multiple seek ranges. Actual support is probably still
far away.

One change that messes deeper with the actual code is that we account
for total buffered bytes instead of just the backwards bytes now. This
way, prune_old_packets() doesn't have to iterate over all seek ranges to
determine whether something needs pruning.
2017-11-04 23:45:21 +01:00
wm4
10d0963d85 demux: improve and optimize cache pruning and seek range determination
The main purpose of this commit is avoiding any hidden O(n^2) algorithms
in the code for pruning the demuxer cache, and for determining the
seekable boundaries of the cache. The old code could loop over the whole
packet queue on every packet pruned in certain corner cases.

There are two ways how to reach the goal:
 1) commit a cardinal sin
 2) do everything incrementally

The cardinal sin is adding an extra field to demux_packet, which caches
the determined seekable range for a keyframe range. demux_packet is a
rather general data structure and thus shouldn't have any fields that
are not inherent to its use, and are only needed as an implementation
detail of code using it. But what are you gonna do, sue me?

In the future, demux.c might have its own packet struct though. Then the
other existing cardinal sin (the "next" field, from MPlayer times) could
be removed as well.

This commit also changes slightly how the seek end is determined. There
is a note on the manpage in case anyone finds the new behavior
confusing. It's somewhat cleaner and  might be needed for supporting
multiple ranges (although that's unclear).
2017-11-04 23:18:42 +01:00
wm4
6e998be7be demux: reduce overhead when searching over keyframe ranges
The demuxer cache seeking mechanism looks at keyframe ranges to
determine the earlierst PTS of a packet. Instead of looping over all
packets twice (once to find the next keyframe, a second time to find the
seek PTS), do it in one go.

For that basically turn recompute_keyframe_target_pts() into an
iteration functionn. Functionality should be unchanged with this commit.
2017-11-04 18:18:42 +01:00
wm4
75cdd13e29 player: log if NDEBUG is defined
I sure want to know whether assert()s were unexpectedly not compiled in.
2017-11-04 17:48:30 +01:00
wm4
104e18214c demux: avoid excessive readahead after cache seek
The base_ts field is used to guess the decoder position, and when set to
NOPTS, it just read ahead arbitrarily. Also demux_add_packet() sets
base_ts to the new timestamp when appending a packet, which would also
make it readahead by a too large amount.

Fix this by setting base_ts after a seek. This assumes that normally, a
cached seek target will always have the timestamp set. This is actually
not quite clear (as it calls recompute_keyframe_target_pts(), which
looks at multiple packets), but maybe it works well enough.
2017-11-04 17:43:22 +01:00
wm4
c1f981f863 demux: make pruning more efficient for unseekable demuxer cache
Don't do any of the extra work related to pruning the backbuffer if
demuxer cache seeking is disabled.

As a small extra, always prune packets with no timestamps immediately,
or queue heads that are not keyframes. (Both of them would be pruned
from the backbuffer by the normal logic anyway.)
2017-11-04 17:32:34 +01:00
wm4
d46c9c2e62 demux: on queue overflow wake up reader thread on EOF only
This seems to make more sense.
2017-11-03 16:36:59 +01:00
wm4
49ae599883 demux: don't show queue overflow warning when merely prefetching
If fulfilling --demuxer-readahead-secs requires more memory than allowed
by --demuxer-max-bytes, the queue obviously overflows. But the warning
is normally only for the case when trying to find the next video or
audio packet fails, and decoding can't continue.

Use a separate variable for determining whether to prefetch, and if the
queue has overflown, skip the message. In fact, skip the EOF setting and
waking up the decoder thread as well, because the EOF flag should not be
(have been) set in this situation, and waking up the reader thread helps
only if the EOF state changed.
2017-11-03 16:36:21 +01:00
wm4
5261d1b099 vo_gpu: don't re-render hwdec frames when repeating frames
Repeating frames (for display-sync) is not supposed to render the entire
frame again. When using hardware decoding, it unfortunately did: the
renderer uses the frame ID to check whether the frame data changed, and
unmapping the hwdec frame clears it.

Essentially reverts commit 761eeacf5407cab07. Back then I probably
thought it would be a good idea to release the hwdec image quickly in
order to return it to the decoder, but they're referenced anyway.

This should increase the performance and reduce GPU work.
2017-11-03 15:11:56 +01:00
wm4
4fca1856e1 demux: don't allow subtitles to mess up buffered time display
In a shit show of subtle corner case interactions, making the demuxer
cache buffer the entire file can display a small buffered time if
subtitles are enabled. The reason is that some subtitle decoders may
read in advance infinitely, i.e. they read the entire subtitle stream.
Then, since the other streams (audio/video) have logically reached EOF,
and the subtitle stream is set to ds->active==true. This will have to be
fixed properly later to account buffering for subtitle-only files
(another corner case) correctly, but for now this is less annoying.
2017-11-03 14:58:13 +01:00
wm4
36630585f6 osc: make cycling visibility an input.conf key binding
As builtin script, it should not register global key bindings, and add
them to input.conf instead. This is similar to what stats.lua does.
2017-11-03 14:41:18 +01:00
wm4
57248915fa demux: add option to create CC tracks eagerly
We don't hope to auto-detect them at load time, as that would be too
much of a pain - even FFmpeg requires fetching and parsing of video
packets, and exposes the information only via deprecated API.

But there still needs to be a way to select them by default. This is
also needed to get the first CC packet at all (without seeking back).

This commit also attempts to clean up locking a bit, which is a PITA,
but it's better be careful & clean.
2017-11-03 13:55:32 +01:00
wm4
99dd2f57f0 vo_gpu: ra_gl: fix minimum GLSL version to 120
Not sure why there was 110, or why there is even a default.
2017-11-03 11:53:31 +01:00