This directly reads individual mkv sub-packets (block laces) into a
dedicated AVBufferRefs, which can be directly used for creating packets
without a additional copy of the packet data. This also means we switch
parsing of block header fields and lacing metadata to read directly from
the stream, instead of a memory buffer.
This could have been much easier if libavcodec didn't require padding
the packet data with zero bytes. We could just have each packet
reference a slice of the block data. But as it is, the only way to get
padding without a copy is to read the laces into individually allocated
(and padded) memory block, which required a larger rewrite.
This probably makes recovering from broken mkv files slightly worse if
the transport is unseekable. We just read, and then check if we've
overread. But I think that shouldn't be a real concern.
No actual measureable performance change. Potential for some
regressions, as this is quite intrusive, and touches weird obscure shit
like mkv lacing. Still keeping it because I like how it removes some
redundant EBML parsing functions.
This was probably the intention all along. But I honestly have no idea
what this code even does.
Due to what ebml_read_vlen_int() is used for, this is unlikely to have
mattered anyway as it rarely/never reads huge values. Which is probably
why this has worked for over a decade.
While unknown lengths are supported in some important cases like
segments and clusters, they are not for small and complex metadata
elements like the track list. Such elements are simply rejected.
This case was caught by the size sanity check below, but the message is
misleading and wrong.
(There are likely no files in the wild which require support for this.
The sample file I've seen was muxed by libavformat, but in a case where
it aborted when writing the header. Clearly a broken file.)
Integer and float elements are encoded as a sequence of bytes prefixed
by a variable-length encoded length specifier. If the length is 0, then
there is no data. Whether this is valid or not is not really clear, but
some sample files which do this have surfaced. It's not particularly
hard to handle this, so just do it.
Use char* for strings instead of bstr (data ptr + length pair). Matroska
actually (probably) allows "padding" strings with \0 bytes, so using
normal C strings instead of byte strings is more appropriate.
Reading IDs must be checked too. This was basically forgotten in commit
f3a978cd. Also set the *length parameter for ebml_parse_length() in some
error cases, which _really_ should happen.
Fixes#1461.
Apparently, originally this code was meant to be able to read past the
buffer somewhat, which is why the buffer allocation was padded by 8
byte. This is unclean and confuses valgrind. This probably could have
crashed with certain invalid files too.
Also revert the change added with 10a2f69; it should be not needed
anymore.
These actually are harmless. Even if the data the reader is working on
is essentially random, it's treated like untrusted input data, so there
should be no harm.
But it upsets tools like valgrind.
Probably fixes#1329.
Found by clang sanitizer. Casting unsigned integers to signed integers
with same size has implementation defined behavior (it's even allowed to
crash), but it seems reasonable to expect that reasonable
implementations do a complement of 2 "conversion".
This was once central, but now it's almost unused. Only vf_divtc still
uses it for extremely weird and incomprehensible reasons. The use in
stream.c is trivial. Replace these, and remove mpbswap.h.
bstr.c doesn't really deserve its own directory, and compat had just
a few files, most of which may as well be in osdep. There isn't really
any justification for these extra directories, so get rid of them.
The compat/libav.h was empty - just delete it. We changed our approach
to API compatibility, and will likely not need it anymore.
Some of these might be security relevant.
The RealAudio code was especially bad. I'm not sure if all RealAudio
stuff still plays correctly; I didn't have that many samples for
testing. Some checks might be unnecessary or overcomplicated compared
to the (obfuscated) nature of the code.
CC: @mpv-player/stable
Many ebml_read_* functions have a length int pointer parameter, which
returns the number of bytes skipped. Nothing actually needed this
(anymore), and code using it was rather hard to understand, so get rid
of them.
Matroska makes it pretty hard to resync correctly on broken files:
random data returns "valid" EBML IDs with a high probability, and when
trying to skip them it's likely that you skip a random amount of data
(instead of considering the element length invalid).
Improve upon this by skipping known level 1 elements only. Consider
everything else invalid and call the resync code. This might result in
annoying behavior when Matroska adds new level 1 elements, although it
won't be particularly harmful. Matroska doesn't really allow us to do
better (even mkvtoolnix explicitly checks for known level 1 elements).
Since we now don't always want to combine EBML element skipping and
resyncing, remove ebml_read_skip_or_resync_cluster(), and make
ebml_read_skip() more tolerant against skipping broken elements.
Also, don't resync when reading sub-elements, and instead do resyncing
when reading them results in an error.
Until now, corrupted files were detected if the size of an element (that
should be skipped) was larger than the remaining file. This still could
skip larger regions of the file itself if the broken size happened to be
within the file.
Change it so that it's never allowed to skip outside the parent's
element.
The TV code pretends to be part of stream/, but it's actually demuxer
code too. The audio_in code is shared between the TV code and
stream_radio.c, so stream_radio.c needs a small hack until stream.c is
converted.
In general, this warning can hint to actual bugs. We don't enable it
yet, because it would conflict with some unmerged code, and we should
check with clang too (this commit was done by testing with gcc).
The stream EOF flag should only be set when trying to read past the end
of the file (relatively similar to unix files). Always clear the EOF
flag on seeking. Trying to set it "properly" (depending whether data is
available at seek destination or not) might be an ok idea, but would
require attention to too many special cases. I suspect before this
commit (and in MPlayer etc. too), the EOF flag wasn't handled
consistently when the stream position was at the end of the file.
Fix one special case in ebml.c and stream_skip(): this function couldn't
distinguish between at-EOF and past-EOF either.
Fixes test7.mkv from the Matroska test file collection, as well as some
real broken files I've found in the wild. (Unfortunately, true recovery
requires resetting the decoders and playback state with a manual seek,
but it's still better than just exiting.)
If there are broken EBML elements, try harder to skip them correctly.
Do this by searching for the next cluster element. The cluster element
intentionally has a long ID, so it's a suitable element for
resynchronizing (mkvmerge does something similar).
We know that data is corrupt if the ID or length fields of an element
are malformed. Additionally, if skipping an unknown element goes past
the end of the file, we assume it's corrupt and undo the seek. Do this
because it often happens that corrupt data is interpreted as correct
EBML elements. Since these elements will have a ridiculous values in
their length fields due to the large value range that is possible
(0-2^56-2), they will go past the end of the file. So instead of
skipping them (which would result in playback termination), try to
find the next cluster instead. (We still skip unknown elements that
are within the file, as this is needed for correct operation. Also, we
first execute the seek, because we don't really know where the file
ends. Doing it this way is better for unseekable streams too, because
it will still work in the non-error case.)
This is done as special case in the packet reading function only. On
the other hand, that's the only part of the file that's read after
initialization is done.
Finish renaming directories and moving files. Adjust all include
statements to make the previous commit compile.
The two commits are separate, because git is bad at tracking renames
and content changes at the same time.
Also take this as an opportunity to remove the separation between
"common" and "mplayer" sources in the Makefile. ("common" used to be
shared between mplayer and mencoder.)
Tis drops the silly lib prefixes, and attempts to organize the tree in
a more logical way. Make the top-level directory less cluttered as
well.
Renames the following directories:
libaf -> audio/filter
libao2 -> audio/out
libvo -> video/out
libmpdemux -> demux
Split libmpcodecs:
vf* -> video/filter
vd*, dec_video.* -> video/decode
mp_image*, img_format*, ... -> video/
ad*, dec_audio.* -> audio/decode
libaf/format.* is moved to audio/ - this is similar to how mp_image.*
is located in video/.
Move most top-level .c/.h files to core. (talloc.c/.h is left on top-
level, because it's external.) Park some of the more annoying files
in compat/. Some of these are relicts from the time mplayer used
ffmpeg internals.
sub/ is not split, because it's too much of a mess (subtitle code is
mixed with OSD display and rendering).
Maybe the organization of core is not ideal: it mixes playback core
(like mplayer.c) and utility helpers (like bstr.c/h). Should the need
arise, the playback core will be moved somewhere else, while core
contains all helper and common code.