Commit Graph

773 Commits

Author SHA1 Message Date
Andreas Rheinhardt 40d9cbdc22 avformat/matroskadec: Use AV_DICT_DONT_STRDUP_VAL to save av_strdup
This will likely also fix CID 1452562, a false positive resulting from
Coverity thinking that av_dict_set() automatically frees its key and
value parameters (even without the AV_DICT_DONT_STRDUP_* flags).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-01 16:38:28 +01:00
Andreas Rheinhardt 2ff687c17f avformat/matroskadec: Fix lzo decompression
When a Matroska Block is only stored in compressed form, the size of
the uncompressed block is not explicitly coded and therefore not known
before decompressing it. Therefore the demuxer uses a guess for the
uncompressed size: The first guess is three times the compressed size
and if this is not enough, it is repeatedly incremented by a factor of
three. But when this happens with lzo, the decompression is neither
resumed nor started again. Instead when av_lzo1x_decode indicates that x
bytes of input data could not be decoded, because the output buffer is
already full, the first (not the last) x bytes of the input buffer are
resent for decoding in the next try; they overwrite already decoded
data.

This commit fixes this by instead restarting the decompression anew,
just with a bigger buffer.

This seems to be a regression since 935ec5a1.

A FATE-test for this has been added.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-28 22:40:13 -03:00
Andreas Rheinhardt af50f0a515 avformat/matroskadec: Fix use-after-free when demuxing ProRes
ProRes in Matroska is supposed to not contain the first atom header
(containing a size field and the tag "icpf") and therefore the Matroska
demuxer has to recreate it; this involves an allocation and copy, of
course. Whether the old buffer (containing the data without the atom
header) needs to be freed or not depends upon whether it is what was
directly read (in which case it is owned by an AVBuffer) or whether it
has been allocated when reversing the track's content compression (e.g.
zlib compression) that Matroska supports.

So there are three pointers involved: The one pointing to the directly
read data (owned by the AVBuffer), the one pointing to the currently
valid data (which coincides with the former if no content compression
needed to be reverted) and the one pointing to the new data with the
first atom header. The check for whether to free the second of these is
simply whether the first two are different.

This works mostly, but there is a complication: Some muxers don't strip
the first atom header away and in this case, it is also not reinserted
and no new buffer is allocated; instead, the second and the third
pointers agree. In this case, one must never free the second buffer.
Yet it is currently done if the track is e.g. zlib compressed.
This commit fixes this.

This is a regression since b8e75a2a.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-07 12:36:21 -03:00
Andreas Rheinhardt d5274f86a8 avformat/matroskadec: Reuse AVIOContext
When parsing EBML lacing, for every number read, a new AVIOContext has
been initialized (via ffio_init_context()) just for this number. This
has been changed: The context is kept now.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt dbe3be6744 avformat/matroskadec: Improve frame size parsing error messages
When parsing the sizes of the frames in a lace fails, sometimes no
error message was raised (e.g. when using xiph or fixed-size lacing).
Only EBML lacing generated error messages (which were wrongly declared
as AV_LOG_INFO), but even here not all errors resulted in an error
message. So add a generic error message to catch them all.

Moreover, if parsing one of the EBML numbers fails, ebml_read_num already
emits its own error messages, so that all that is needed is a generic error
message to indicate that this happened during parsing the sizes of the
frames in a block; in other words, the error messages specific to
parsing EBML lace numbers can be and have been removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt f74eaa17bb avformat/matroskadec: Remove unnecessary check
870e7552 introduced validating the lace sizes when they are parsed and
removed the old check; yet when merging this libav commit in 6902c3ac,
the old check for whether the frame extends beyond the frame has been kept.
It is unnecessary and has been removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt a69f92a946 avformat/matroskadec: Simplify control flow of parsing laces
Up until now, when an error happened in one of the inner loops in
matroska_parse_laces, a variable designated for the return value has
been set to an error value and break has been used to exit the
current loop/case. This was done so that the end of matroska_parse_laces
is reached, because said function allocated memory which is later used
and freed in the calling function and passed at the end of
matroska_parse_laces.

But given that there is no allocation any more, one can now return
immediately. And this commit does this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt 9ad1a6d64c avformat/matroskadec: Avoid allocating array for lace sizes
The maximal number of frames in a lace can be 256; hence one has a not
excessive upper bound on the size of an array that can hold the sizes of
all the frames in a lace. Yet up until now, said array has been
dynamically allocated. This has been changed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt 668490ac98 avformat/matroskadec: Use bytestream API instead of AVIOContext
It avoids the overhead of function calls.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt eec26b5911 avformat/matroskadec: avcodec/tta: Set extradata_size to 22
Up until c4e0e314, the seek table has been included in the tta
extradata, so that the size of said extradata was 22 (the size of a TTA1
header) + 4 * number of frames. The decoder rejected anything below a
size of 30 and so the Matroska demuxer exported 30 byte long extradata,
of which only 18 were set (it ignores a CRC-32 and simply leaves it at
0). But this is unnecessary since said commit, so reduce the size to 22.

Furthermore, replace 30 by 22 in a comment about the extradata size in
libavcodec/tta.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt f7bf59b431 avformat/matroskadec: Check before allocations
That way one doesn't have to free later. In this case (concerning TTA
extradata), this also fixes a memleak when the output samplerate is
invalid.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-12-04 23:11:37 -03:00
Andreas Rheinhardt dbc50f8a93 avformat/matroskadec: Fix default value of BlockAddID
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-11-20 11:22:14 -03:00
James Almer 3b4e9a31ea avformat/matroskadec: don't rescale mastering display values
Simplifies code.

Signed-off-by: James Almer <jamrial@gmail.com>
2019-10-05 22:37:41 -03:00
Andreas Rheinhardt 581419ea39 avformat/matroskadec: Fix demuxing ProRes
The structure of a ProRes frame in mov/mp4 is that of a typical atom:
First a 32 bit BE size field, then a tag detailling the content. Said
size field includes the eight bytes of the atom header.

This header is actually redundant, as the size of the atom is already
known from the containing atom. It is therefore stripped away when muxed
into Matroska and so the Matroska demuxer has to recreate upon demuxing.
But it did not account for the fact that the size field includes the
size of the header and this can lead to problems when a decoder uses the
in-band size field.

Fixes ticket #8210.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-10-04 00:06:30 -03:00
Michael Niedermayer fccc37ca85 repeat an even number of characters in occured
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-16 00:04:18 +02:00
James Almer 3b3150c45f avformat/matroskadec: use av_fast_realloc to reallocate ebml list arrays
Speeds up the process considerably.

Fixes ticket #8109.

Suggested-by: nevcairiel
Suggested-by: cehoyos
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-04 10:08:17 -03:00
James Almer f34aabfbae avformat/matroskadec: use proper types for some EbmlSyntax fields
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-04 10:07:13 -03:00
Andreas Rheinhardt c294f38c91 avformat/matroskadec: Fix seeking
matroska_reset_status (a function that is used during seeking (among
other things)) used an int for the return value of avio_seek which
returns an int64_t. Checking the return value then indicated an error
even though the seek was successfull for targets in the range of
2GB-4GB, 6GB-8GB, ... This error implied that the status hasn't been
reset and in particular, the old level was still considered to be in
force, so that ebml_parse returned errors because the newly parsed
elements were of course not contained in the previously active and still
wrongly considered active master element any more.

Addresses ticket #8084.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-08-16 21:33:54 -03:00
Andreas Rheinhardt 9869e21776 matroskadec: Remove redundant const
The typedef used to define EbmlSyntax already includes a const qualifier
so that it is unnecessary to include another const qualifier in future
definitions and declarations. Given that MSVC warns about this, this
commit removes these redundant const qualifiers.

Suggested-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-18 22:26:00 +02:00
Andreas Rheinhardt ab4795a085 matroskadec: Add sizes to forward declarations
Unknown-length elements end when an element not allowed in them, but
allowed at a higher level is encountered. In order to check for this,
c1abd95a added a pointer to every syntax level's parent to each
EbmlSyntax. Given that the parent must of course also reference the
child in order to be able to enter said child level, one needs to use
forward declarations.
These forward declarations constitute tentative definitions and tentative
definitions with internal linkage (like our syntaxes) must not be an
incomplete type. Yet they were an incomplete type and while GCC and
Clang did not even warn about this (on default warning levels), it
broke compilation with MSVC. Therefore this commit adds the sizes.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-18 09:38:10 +02:00
Andreas Rheinhardt 806ac7da69 avformat/matroskadec: Improve check for level 1 duplicates
If a file uses unknown-length level 1 elements besides clusters and such
elements are after the first cluster, then these elements will usually
be parsed twice: Once during parsing of the file header and once when
reading the file reaches the position where these elements are located.
The second time the element is parsed leads to a "Duplicate element"
error message. Known-length elements are not affected by this as they
are skipped except during parsing the header.

This commit fixes this by explicitly adding a check for whether the
position of the element to be parsed is the same as the position of the
already known level 1 element.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:18:10 -03:00
Andreas Rheinhardt 730ac1ae80 avformat/matroskadec: Use file offsets for level 1 elements
This commit converts the MatroskaLevel1Element struct to use file-based
offsets, as opposed to the current practice of using offsets relative to
the beginning of the segment in it. This also includes a change from
uint64_t to int64_t.

This is in preparation to another patch that improves the check for
duplicate level 1 elements.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:18:10 -03:00
Andreas Rheinhardt 6854127a76 avformat/matroskadec: Reindent after previous commit
Also use the smallest scope possible for a loop variable.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:18:10 -03:00
Andreas Rheinhardt 71c908817e avformat/matroskadec: Accept more unknown-length elements II
Up until now, one last kind of unknown-length element hasn't been
properly handled: Unknown-length elements that are supposed to be
skipped, i.e. the level 1 elements that might reside after the
clusters.

This commit changes this. To do this, ebml_parse got a mode that
essentially tries to skip everything except when parsing is needed
(namely for unknown-length elements for which parsing is necessary
as they can't be skipped). This mode is selected by using a NULL
as destination where the parsed data should be written to.
It is used to parse the level 1 elements in matroska_parse_cluster.

The syntax list used for parsing must of course include links to
the syntax of all the master elements that might need to be parsed.
In other words: Instead of matroska_clusters (which contained every
level 1 element except clusters as EBML_NONE elements designated to
be skipped) matroska_segment is needed and used; matroska_clusters has
been removed.

Furthermore, matroska_segment has been reordered so that clusters are at
the front as this is now the most common case for this list.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:18:09 -03:00
Andreas Rheinhardt 9c6d14ab84 avformat/matroskadec: Fix probing of unknown-length headers
matroska_probe did not support the case of an unknown-length EBML header
at all; given that libavformat's Matroska muxer used to produce such
files in the streaming case, support for them has been added.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:17:00 -03:00
Andreas Rheinhardt 3c70b941d5 avformat/matroskadec: Accept more unknown-length elements
The current Matroska specifications mandate that only two elements may
use an unknown-length length: Segments and clusters. But this was not
always so: For the greater part of Matroska's existence, all master
elements were allowed to make use of the unknown-length feature.

And there were muxers creating such files: For several years
libavformat's Matroska muxer used unknown-length for all master
elements when the output wasn't seekable. This only stopped in March
2010 with 2529bb30. And even afterwards it was possible (albeit
unlikely) for libavformat to create unknown-length master elements
that are in violation of today's specifications, namely if the master
element was so big that the seek backwards to update the size could
no longer be performed inside the AVIOContext's write buffer. This
has only been fixed in October 2016 (with the patches that introduced
support for writing CRC-32 elements).

Libavformat's Matroska demuxer meanwhile has never really supported
unknown-length elements besides segments and clusters. Support for the
latter was hardcoded. This commit changes this: Now all master elements
for which a syntax to parse them is available are supported. This
includes the files produced by old versions of libavformat's muxer.

More precisely, master elements that have unknown length and are about
to be parsed (not skipped) are supported; only a warning is emitted for
them. For normal files, this means that level 1 elements after the
clusters that are encountered after the clusters have been parsed (i.e.
not because they are referenced by the seekhead at the beginning of the
file) are still unsupported (they would be skipped at this point if
their length were known).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:17:00 -03:00
Andreas Rheinhardt 04b62bd7ce avformat/matroskadec: Improve invalid length error handling
1. Up until now, the error message for EBML numbers whose length exceeds
the limits imposed upon them because of the element's type did not
distinguish between known-length and unknown-length elements. As a
consequence, the numerical value of the define constant
EBML_UNKNOWN_LENGTH was emitted as part of the error message which is
of course not appropriate. This commit changes this by adding error
messages designed for unknown-length elements.

2. We impose some (arbitrary) sanity checks on the lengths of certain
element types; these checks were conducted before the checks depending
on whether the element exceeds its containing master element. Now the
order has been reversed, because a failure at the (formerly) latter
check implies that the file is truly erroneous and not only fails our
arbitrary length limit. Moreover, this increases the informativeness of
the error messages.

3. Furthermore, the error message in general has been changed by replacing
the type of the element (something internal to this demuxer and
therefore suitable as debug output at best, not as an error message
intended for ordinary users) with the element ID. The element's position
has been added, too.

4. Finally, the length limit for EBML_NONE elements has been changed so
that all unknown-length elements of EBML_NONE-type trigger an error.
This is done because unknown-length elements can't be skipped and need
to be parsed, but there is no syntax to parse available for EBML_NONE
elements. This is done in preparation for a further patch which allows
more unknown-length elements than just clusters and segments.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:17:00 -03:00
Andreas Rheinhardt 5120305137 avformat/matroskadec: Don't skip too much when unseekable
The Matroska (and WebM) file format achieves forward-compability by
insisting that demuxers ignore and skip elements they don't know about.
Unfortunately, this complicates the detection of errors as errors
resulting from loosing sync can't be reliably distinguished from
unknown elements that are part of a future version of the standard.

Up until now, the strategy to deal with this situation was to skip all
unknown elements that are not obviously erroneous; if an error happened,
it was tried to seek to the last known good position to resync from (and
resync to level 1 elements). This is working fine if the input is
seekable, but if it is not, then the skipped data can usually not be
rechecked lateron. This is particularly acute if unknown-length clusters
are in use, as the check for whether a child element exceeds the
containing master element is ineffective in this situation.

To remedy this, a new heuristic has been introduced: If an unknown
element is encountered in non-seekable mode, an error is presumed to
have happened based upon a combination of the length of the row of the
already encountered unknown elements and of how far away skipping this
element would take us.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:17:00 -03:00
Andreas Rheinhardt 60f75c9976 avformat/matroskadec: Typos, nits and cosmetics
Cosmetics include reordering EbmlType so that EBML_SINT is adjacent to
the other numbers (and matches the order in the switch in ebml_parse)
and also reordering the switch for assignment of default values so that
it matches the order in EbmlType.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:17:00 -03:00
Andreas Rheinhardt 7087fc95b2 avformat/matroskadec: Reuse positions
Up until now, avio_tell was used multiple times in ebml_parse and its
subroutines, although the result of these calls can usually be simply
derived from the result of earlier calls to avio_tell. This has been
changed. Unnecessary calls to avio_tell in ebml_parse are avoided now.

Furthermore, there has been a slight change in the output of some error
messages relating to elements exceeding their containing master element:
The reported position of the element now points to the first byte of the
element ID and no longer to the first byte of the element's payload.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:59 -03:00
Andreas Rheinhardt 3ed2755baa avformat/matroskadec: Redo EOF handling
This commit closes the last hole in the system of checks for a
known-length file ending too early: Now an error message is emitted
in case the file ends directly after an EBML element.

Furthermore, this commit adds a check and a corresponding warning
whether there is data beyond the Matroska segment (only reasonable for
known-length segments). If everything looks alright, then parsing is
stopped as soon as EOF is reached (in contrast, the earlier code would
always call matroska_resync at the end).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:59 -03:00
Andreas Rheinhardt 38255cdcf8 avformat/matroskadec: Combine arrays
By including SimpleBlocks and BlockGroups twice in the same EbmlSyntax
array (with different semantics), one can reduce the duplication of the
other values.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:59 -03:00
Andreas Rheinhardt a9f051519e avformat/matroskadec: Don't reset cluster position
The new code does not rely on whether the cluster's position is set or
not to infer whether a cluster needs to be closed or not (instead, this
is done in ebml_parse), so there is no need to reset the cluster's
position at all any more. It will be automatically set to the correct
value when a cluster is entered.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:59 -03:00
Andreas Rheinhardt 865c537007 avformat/matroskadec: Make cluster parsing level compatible
Before this commit, the parsing of clusters mixed EBML levels by
allowing elements from different levels in a EbmlSyntax (namely
matroska_cluster_parsing). This has been changed. And the level
is now explicitly used to determine how to parse.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:59 -03:00
Andreas Rheinhardt b31c9b72e5 avformat/matroskadec: Redo level handling
This commit changes how levels are handled: If the level used for
ebml_parse ends directly after an element that has been consumed, then
ebml_parse ends the level itself (and any known-length levels that end
there as well) and informs the caller via the return value; if the
current level is of unknown-length, then the level is ended as soon as
an element that is not valid on the current level, but on a higher
level is encountered (or if EOF has been encountered).

This is designed for situations where one wants to parse master elements
incrementally, i.e. not in one go via ebml_parse_nest.

The (incremental) parsing of clusters still mixes levels by using a
syntax list that contains elements from different levels and the level
is still ended manually via a call to ebml_level_end if the last cluster
was an unknown-length cluster (known-length clusters are already ended
when their last element is read), but only if the next element is a
cluster, too. A  different level 1 element following an unknown-length
cluster will currently simply be presumed to be part of the earlier
cluster. Fixing this will be done in a future patch. The modifications
to matroska_parse_cluster contained in this patch are only intended not
to cause regressions.

Nevertheless, the fact that known-length levels are automatically ended
in ebml_parse when their last element has been read already fixes a bogus
error message introduced in 9326117b that was emitted when a known-length
cluster is followed by another level 1 element other than a cluster in
which case the cluster's level was not ended (which only happened when
a new cluster has been encountered) so that the length check (introduced
in 9326117b) failed for the level 1 element as it is of course not
contained in the previous cluster. Most Matroska files were affected by
this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt c1abd95ad0 avformat/matroskadec: Link to parents in syntax tables
By linking to the syntax of the parent (i.e. the containing master
element) one can check whether an element is actually part of a higher
level in the EBML hierarchy. Knowing this is important for
unknown-length levels, because they end when an element that doesn't
belong to this, but to a higher hierarchy level is encountered.

Sometimes there are different syntaxes dealing with the same elements.
In this case it is important to use a parent that contains all the
elements at the parent level; whether this is the syntax actually used
to enter the child's level is irrelevant. This affects the list of level
1 elements (which has been used as parent for matroska_cluster, too) and
it affects recursive elements (currently only the SimpleTag), where the
non-recursive parent has to be choosen.

This is in preparation for a patch that redoes level handling.

Finally, the segment id has been added to ebml_syntax. This will enable
handling of unknown-length EBML headers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt a3db9f62a4 avformat/matroskadec: Introduce a "last known good" position
Currently, resyncing during reading packets works as follows:
The current position is recorded, then a call to matroska_parse_cluster
is made and if said call fails, the demuxer tries to resync from the
earlier position. If the call doesn't fail, but also doesn't deliver a
packet, then this is looped.

There are two problems with this approach:
1. The Matroska file format aims to be forward-compatible; to achieve
this, a demuxer should simply ignore and skip elements it doesn't
know about. But it is not possible to reliably distinguish unknown
elements from junk. If matroska_parse_cluster encounters an unknown
element, it can therefore not simply error out; instead it returns zero
and the loop is iterated which includes an update of the position that
is intended to be used in case of errors, i.e. the element that is
skipped is not searched for level 1 element ids to resync to at all if
later calls to matroska_parse_cluster return an error.
Notice that in case that sync has been lost there can be a chain of
several unknown/possibly junk elements before an error is detected.

2. Even if a call to matroska_parse_cluster delivers a packet, this does
not mean that everything is fine. E.g. it might be that some of the
block's data is missing and that the data that was presumed to be from
the block just read actually contains the beginning of the next element.
This will only be apparent at the next call of matroska_read_packet,
which uses the (false) end of the earlier block as resync position so
that in the (not unlikely) case that the call to matroska_parse_cluster
fails, the data believed to be part of the earlier block is not searched
for a level 1 element to resync to.

To counter this, a "last known good" position is introduced. When an
element id that is known to be allowed at this position in the hierarchy
(according to the syntax currently in use for parsing) is read and some
further checks (regarding the length of the element and its containing
master element) are passed, then the beginning of the current element is
treated as a "good" position and recorded as such in the
MatroskaDemuxContext. Because of 2., only the start of the element is
treated as a "good" position, not the whole element. If an error occurs
later during parsing of clusters, the resync process starts at the last
known good position.

Given that when the header is damaged the subsequent resync never skips over
data and is therefore unaffected by both issues, the "last known good"
concept is not used there.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt 559e3422c7 avformat/matroskadec: Refactor some functions
Since the changes to the parsing of SimpleBlocks, both ebml_parse_id and
ebml_parse_elem are only called from one place, so that it is possible
to inline these two function calls. This is done, but not completely:
ebml_parse_id still exists in a modified form. This is done in
preparation for a further patch regarding the handling of
unknown-length elements.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt 8a286e745d avformat/matroskadec: Use proper levels after discontínuity
The earlier code set the level to zero upon seeking and after a
discontinuity although in both cases parsing (re)starts at a level 1
element.

Also set the segment's length to unkown if an error occured in order not
to drop any valid data that happens to be beyond the designated end of
the segment.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt 310f326b43 avformat/matroskadec: Add function to reset status
This function will be useful later to reset the status (e.g. current
level and the already parsed id).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:57 -03:00
Andreas Rheinhardt 27f40b1dcd avformat/matroskadec: Don't abort resyncing upon seek failure
When an error happens, the Matroska demuxer tries to resync to level 1
elements from an earlier position onwards. If the seek to said earlier
position fails, the demuxer currently treats this as an unrecoverable
error. And that behaviour is suboptimal as said failure is nothing
unrecoverable or unexpected (when the input isn't seekable).
It is preferable to simply resync from the earliest position available
(i.e. the start of the AVIOContext's buffer) onwards if the seek failed.

Here are some scenarios that might be treated as unrecoverable errors
by the current code if the input isn't seekable. They all have in
common that the current position is so far away from the desired
position that the seek can't be fulfilled from the AVIOContext's buffer:

1. Blocks (both SimpleBlocks as well as a Block in a BlockGroup) for
which reading them as binary EBML elements succeeds, but whose parsing
triggers an error (e.g. an invalid TrackNumber). In this case the
earlier position from which resyncing begins is at the start of the block
(or even earlier).
2. BlockGroups, whose parsing fails in one of the latter elements. Just
as in 1., the start of the BlockGroup (the target of the seek) might be
so far away from the current position that it is no longer in the
buffer.
3. At the beginning of parsing a cluster, the cluster is parsed until a
SimpleBlock or a BlockGroup is encountered. So if the input is damaged
between the beginning of the cluster and the first occurrence of a
SimpleBlock/BlockGroup and if said damage makes the demuxer read/skip so
much data that the beginning of the cluster is no longer in the buffer,
demuxing will currently fail completely.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:57 -03:00
Andreas Rheinhardt eb33be188d matroskadec: Fix overflow introduced in a569a7b3
This commit fixes an overflow introduced in a569a7b3 that affected EBML
elements that the Matroska demuxer doesn't want to parse like CRC-32
elements. The return value of avio_skip (the new position on success or
an AVERROR on failure) has been assigned to an integer which meant that
new positions in the range of 2GB to 4GB-1 etc. were considered errors.

Fixes ticket #8001.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-07-06 14:48:54 -03:00
Andreas Rheinhardt ff5ea59f7b avformat/matroskadec: Improve error/EOF checks III
Up until now, when an element was skipped, it was relied upon
ffio_limit to make sure that there is enough data available to skip.
ffio_limit itself relies upon the availability of the file's size. As
this needn't be available, the check has been refined: First one byte
less than intended is skipped, then another byte is read, followed by a
check of the error flags.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt a569a7b3bb avformat/matroskadec: Improve read error/EOF checks II
This commit fixes a number of bugs:

1. There was no check that no read error/EOF occured during
ebml_read_uint, ebml_read_sint and ebml_read_float.
2. ebml_read_ascii and ebml_read_binary did sometimes not forward
error codes; instead they simply returned AVERROR(EIO).
3. In particular, AVERROR_EOF hasn't been used and no dedicated error
message for it existed. This has been changed.

In order to reduce code duplication, the new error code NEEDS_CHECKING
has been introduced which makes ebml_parse check the AVIOContext's
status for errors.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt 239c7369e0 avformat/matroskadec: Improve read error/EOF checks I
ebml_read_num had a number of flaws:

1. The check for read errors/EOF was totally wrong. E.g. an EBML number
beginning with the invalid 0x00 would be considered a read error,
although it is just invalid data.
2. The check for read errors/EOF was done just once, after reading the
first byte of the EBML number. But errors/EOF can happen inbetween, of
course, and this wasn't checked.
3. There was no way to distinguish when EOF should be an error (because
the data has to be there) for which an error message should be emitted
and when it is not necessarily an error (namely during parsing of EBML
IDs). Such a possibility has been added and used.

All this was fixed; furthermore, the error messages for invalid EBML
numbers were improved and useless initializations were removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt a27e5398e2 avformat/matroskadec: Properly check return values
Up until now, webm_dash_manifest_cues used the return values of
ebml_read_num and ebml_read_length without checking for errors,
i.e. return values < 0. This has been changed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt 1215b3a5f3 avformat/matroskadec: Don't zero unnecessarily
It is only necessary to zero the initial allocated memory used to store
the size of laced frames if the block used Xiph lacing. Otherwise no
unintialized data was ever used, so use av_malloc instead of av_mallocz.

Also use the correct type for the allocations.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 20:59:33 -03:00
Andreas Rheinhardt bc3306fd5b avformat/matroskadec: Treat SimpleBlock as EBML_BIN
Up until now, the SimpleBlock was treated specially: It basically had
its own EBML category and it was also included in the BlockGroup EBML
syntax (although a SimpleBlock must not exist in a BlockGroup according
to the Matroska specifications). The latter fact also meant that
a MatroskaBlock's buffer was always unreferenced twice.
This has been changed: The type of a SimpleBlock is now an EBML_BIN.
The only way in which SimpleBlocks are still different is that they
share their associated structure with another unit (namely BlockGroup).
This is also used to unref the block: It is always unreferenced via the
BlockGroup syntax.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 20:11:24 -03:00
Andreas Rheinhardt ffa64a4db8 avformat/matroskadec: Don't keep old blocks
Before this commit, the Matroska muxer would read a block when required
to do so, parse the block, create and return the necessary AVPackets and
yet keep the blocks (in a dynamically allocated list), although they
aren't used at all any more. This has been changed. There is no list any
more and the block is immediately discarded after parsing.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 00:47:50 -03:00
Andreas Rheinhardt f3ca3e7f19 avformat/matroskadec: Remove redundant initialization
Every new element of an EbmlList is zeroed initially in
ebml_parse_elem, so that in particular a SimpleBlock's duration is
initialized to zero. Therefore it is unnecessary to initialize this
field again (for SimpleBlocks) in matroska_parse_cluster_incremental.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 00:47:50 -03:00