The way this was added to FFmpeg is less than ideal, because it requires
text parsing in the Matroska demuxer. But in order to use the FFmpeg
webvtt-to-ass converter, we still have to mimic this in some way. We do
this by putting the parsing into sd_lavc_conv.c, before the subtitle
packet is passed to libavcodec. At least this keeps the ugliness out of
unrelated code.
There is some change that FFmpeg will fix their design eventually.
Instead of rewriting the parsing code, we simply borrow it from FFmpeg's
Matroska demuxer.
Not actually useful. This would break whenever a new text subtitle
format would be added, which requires a binary->text transformation.
(mov_text is one such format; disable it.) In general, we would have
to know which packet formats are binary, which we don't, so the only
reasonable way to handle this is a white list.
Currently, we are filtering libavformat style ASS packets by checking
whether they are prefixed "Dialogue: ". Unfortunately, comment packets
are demuxed too. These start with "Comment: ", so they are not caught.
Change the filtering, and use the codec ID instead. libavformat uses
"ssa" as codec ID for ASS subtitles, while mpv uses "ass". Also, at
least FFmpeg will change the ASS packet format to the same format mpv
and Matroska use, and identify these with "ass" as codec ID, so this is
works out nicely.
Audio and video had their own (very similar) functions to initialize an
AVPacket (ffmpeg's packet struct) from a demux_packet (mplayer's packet
struct). Add a common function for these.
Also use this function for sd_lavc_conv. This is actually a functional
change, as some libavfilter subtitle demuxers add weird out-of-band
stuff as side-data.
Normally, libavcodec subtitle converters will output a style header like
this as part of the extradata:
Style: Default,Arial,16,&Hffffff,&Hffffff,&H0,&H0,0,0,0,1,1,0,2,10,10,10,0,0
We don't want that, so use some bruteforce to get rid of them.
Otherwise this could happily open decoders for image subtitles or even
audio/video decoders. AV_CODEC_PROP_TEXT_SUB is a preprocessor symbol,
but it's still better to detect this properly instead of using #ifdef,
because these flags might as well be changed into enums sooner or later.
This allows using some formats that were not supported until now, like
WebVTT.
We still prefer the internal subtitle reader (subreader.c), because
1. Libav, and 2. random things which we probably want to keep, such as
control over formatting, codepage stuff, or various mysterious
postprecessing done in that code.