demux_lavf: implement bad hack for backward playback of wav

This commit generally fixes backward playing in wav, at least in most
PCM cases.

libavformat's wav demuxer (and actually all other raw PCM based
demuxers) have a specific behavior that breaks backward demuxing. The
same thing also breaks persistent seek ranges in the demuxer cache,
although that's less critical (it just means some cached data gets
discarded). The backward demuxing issue is fatal,  will log the message
"Demuxer not cooperating.", and then typically stop doing anything.

Unlike modern media formats, these formats don't organize media data in
packets, but just wrap a monolithic byte stream that is described by a
header. This is good enough for PCM, which uses fixed frames (a single
sample for all audio channels), and for which it would be too expensive
to have per frame headers.

libavformat (and mpv) is heavily packet based, and using a single packet
for each PCM frame causes too much overhead. So they typically "bundle"
multiple frames into a single packet. This packet size is obviously
arbitrary, and in libavformat's case hardcoded in its source code.

The problem is that seeking doesn't respect this arbitrary packet
boundary. Seeking is sample accurate. You can essentially seek inside a
packet. The resulting packets will not be aligned with previously
demuxed packets. This is normally OK.

Backward seeking (and some other demuxer layer features) expect that
demuxing an earlier demuxed file position eventually results in the same
packets, regardless of the seeks that were done to get there. I like to
call this "deterministic" demuxing. Backward demuxing in particular
requires this to avoid overlaps, which would make it rather hard to get
continuous output.

Fix this issue by detecting wav and hopefully other raw audio formats
with a heuristic (even PCM needs to be detected as heuristic). Then, if
a seek is requested, align the seek timestamps on the guessed number of
samples in the audio packets returned by the demuxer.

The heuristic excludes files with multiple streams. (Except "attachment"
video streams, which could be an ID3 tag. Yes, FFmpeg allows ID3 tags on
WAV files.) Such files will inherently use the packet concept in some
way.

We don't know how the demuxer chooses the internal packet size, but we
assume that it's fixed and aligned to PCM frame sizes. The frame size is
most likely given by block_align (the native wav frame size, according
to Microsoft). We possibly need to explicitly read and discard a packet
if the seek is done without reading anything before that. We ignore any
subsequent packet sizes; we need to avoid the very last packet, which
likely has a different size.

This hack should be rather benign. In the worst case, it will "round"
the seek target a little, but the maximum rounding amount is bounded.
Maybe we _could_ round up if SEEK_FORWARD is specified, but I didn't
bother.

An earlier commit fixed the same issue for mpv's demux_raw.

An alternative, and probably much better solution would be clipping
decoded data by timestamp. demux.c could allow the type of overlap the
wav demuxer introduces, and instruct the decoder to clip the output
against the last decoded timestamp. There's already an infrastructure
for this (demux_packet.end field) used by EDL/ordered chapters.

Although this sounds like a good solution, mpv unfortunately uses floats
for timestamps. The rounding errors break sample accuracy. Even if you
used integers, you'd need a timebase that is sample accurate (not always
easy, since EDL can merge tracks with different sample rates).
This commit is contained in:
wm4 2019-05-25 16:59:20 +02:00
parent f24ff0e948
commit 204a7725de
2 changed files with 67 additions and 15 deletions

View File

@ -445,7 +445,9 @@ Playback Control
their behavior. There is no list, and the player usually does not detect their behavior. There is no list, and the player usually does not detect
them. Certain live streams (including TV captures) may exhibit problems them. Certain live streams (including TV captures) may exhibit problems
in particular, as well as some lossy audio codecs. h264 intra-refresh is in particular, as well as some lossy audio codecs. h264 intra-refresh is
known not to work due to problems with libavcodec. known not to work due to problems with libavcodec. WAV and some other raw
audio formats tend to have problems - there are hacks for dealing with
them, which may or may not work.
- Function with EDL/mkv ordered chapters is obviously broken. - Function with EDL/mkv ordered chapters is obviously broken.
@ -478,16 +480,6 @@ Playback Control
framestep commands are transposed. Backstepping will perform very framestep commands are transposed. Backstepping will perform very
expensive work to step forward by 1 frame. expensive work to step forward by 1 frame.
- Backward playback in wav files does not work properly (and possibly
similar formats, typically raw audio formats used through libavformat).
This is because libavformat does not align seeks on the packet sizes it
uses. (The packet sizes are arbitrary and chosen by libavformat
internally. Seeks on the other hand are sample-exact, which leads to
overlapping packets if the backward playback state machine seeks back.
This is very complex to work around, so it doesn't attempt to.)
A workaround is to remux to a format like mkv, which enforces packet
boundaries. Making mpv cache the entire file in memory also works.
- Backward playback with Vorbis does not work. libavcodec's decoder - Backward playback with Vorbis does not work. libavcodec's decoder
discards the first Vorbis packet (after each decoder reset), and the discards the first Vorbis packet (after each decoder reset), and the
mechanism behind ``--audio-reversal-buffer`` assumes that it strictly mechanism behind ``--audio-reversal-buffer`` assumes that it strictly

View File

@ -140,6 +140,7 @@ struct format_hack {
bool fix_editlists : 1; bool fix_editlists : 1;
bool is_network : 1; bool is_network : 1;
bool no_seek : 1; bool no_seek : 1;
bool no_pcm_seek : 1;
}; };
#define BLACKLIST(fmt) {fmt, .ignore = true} #define BLACKLIST(fmt) {fmt, .ignore = true}
@ -161,8 +162,8 @@ static const struct format_hack format_hacks[] = {
{"mpeg", .use_stream_ids = true}, {"mpeg", .use_stream_ids = true},
{"mpegts", .use_stream_ids = true}, {"mpegts", .use_stream_ids = true},
{"mp4", .skipinfo = true, .fix_editlists = true}, {"mp4", .skipinfo = true, .fix_editlists = true, .no_pcm_seek = true},
{"matroska", .skipinfo = true}, {"matroska", .skipinfo = true, .no_pcm_seek = true},
{"v4l2", .no_seek = true}, {"v4l2", .no_seek = true},
@ -217,6 +218,10 @@ typedef struct lavf_priv {
struct demux_lavf_opts *opts; struct demux_lavf_opts *opts;
double mf_fps; double mf_fps;
bool pcm_seek_hack_disabled;
AVStream *pcm_seek_hack;
int pcm_seek_hack_packet_size;
// Proxying nested streams. // Proxying nested streams.
struct nested_stream *nested; struct nested_stream *nested;
int num_nested; int num_nested;
@ -677,6 +682,12 @@ static void handle_new_stream(demuxer_t *demuxer, int i)
} }
} }
if (!sh->attached_picture) {
// A real video stream probably means it's a packet based format.
priv->pcm_seek_hack_disabled = true;
priv->pcm_seek_hack = NULL;
}
sh->codec->disp_w = codec->width; sh->codec->disp_w = codec->width;
sh->codec->disp_h = codec->height; sh->codec->disp_h = codec->height;
if (st->avg_frame_rate.num) if (st->avg_frame_rate.num)
@ -785,6 +796,25 @@ static void handle_new_stream(demuxer_t *demuxer, int i)
sh->missing_timestamps = !!(priv->avif_flags & AVFMT_NOTIMESTAMPS); sh->missing_timestamps = !!(priv->avif_flags & AVFMT_NOTIMESTAMPS);
mp_tags_copy_from_av_dictionary(sh->tags, st->metadata); mp_tags_copy_from_av_dictionary(sh->tags, st->metadata);
demux_add_sh_stream(demuxer, sh); demux_add_sh_stream(demuxer, sh);
// Unfortunately, there is no better way to detect PCM codecs, other
// than listing them all manually. (Or other "frameless" codecs. Or
// rather, codecs with frames so small libavformat will put multiple of
// them into a single packet, but not preserve these artificial packet
// boundaries on seeking.)
if (sh->codec->codec && strncmp(sh->codec->codec, "pcm_", 4) == 0 &&
codec->block_align && !priv->pcm_seek_hack_disabled &&
priv->opts->hacks && !priv->format_hack.no_pcm_seek &&
st->time_base.num == 1 && st->time_base.den == codec->sample_rate)
{
if (priv->pcm_seek_hack) {
// More than 1 audio stream => usually doesn't apply.
priv->pcm_seek_hack_disabled = true;
priv->pcm_seek_hack = NULL;
} else {
priv->pcm_seek_hack = st;
}
}
} }
select_tracks(demuxer, i); select_tracks(demuxer, i);
@ -1112,6 +1142,9 @@ static bool demux_lavf_read_packet(struct demuxer *demux,
return true; return true;
} }
if (priv->pcm_seek_hack == st && !priv->pcm_seek_hack_packet_size)
priv->pcm_seek_hack_packet_size = pkt->size;
if (pkt->pts != AV_NOPTS_VALUE) if (pkt->pts != AV_NOPTS_VALUE)
dp->pts = pkt->pts * av_q2d(st->time_base); dp->pts = pkt->pts * av_q2d(st->time_base);
if (pkt->dts != AV_NOPTS_VALUE) if (pkt->dts != AV_NOPTS_VALUE)
@ -1139,6 +1172,7 @@ static void demux_seek_lavf(demuxer_t *demuxer, double seek_pts, int flags)
lavf_priv_t *priv = demuxer->priv; lavf_priv_t *priv = demuxer->priv;
int avsflags = 0; int avsflags = 0;
int64_t seek_pts_av = 0; int64_t seek_pts_av = 0;
int seek_stream = -1;
if (priv->optical_crap_hack) { if (priv->optical_crap_hack) {
if (flags & SEEK_FACTOR) if (flags & SEEK_FACTOR)
@ -1169,13 +1203,39 @@ static void demux_seek_lavf(demuxer_t *demuxer, double seek_pts, int flags)
seek_pts_av = seek_pts * AV_TIME_BASE; seek_pts_av = seek_pts * AV_TIME_BASE;
} }
int r = av_seek_frame(priv->avfc, -1, seek_pts_av, avsflags); // Hack to make wav seeking "deterministic". Without this, features like
// backward playback won't work.
if (priv->pcm_seek_hack && !priv->pcm_seek_hack_packet_size) {
// This might for example be the initial seek. Fuck it up like the
// bullshit it is.
AVPacket pkt = {0};
if (av_read_frame(priv->avfc, &pkt) >= 0)
priv->pcm_seek_hack_packet_size = pkt.size;
av_packet_unref(&pkt);
add_new_streams(demuxer);
}
if (priv->pcm_seek_hack && priv->pcm_seek_hack_packet_size &&
!(avsflags & AVSEEK_FLAG_BYTE))
{
int samples = priv->pcm_seek_hack_packet_size /
priv->pcm_seek_hack->codecpar->block_align;
if (samples > 0) {
MP_VERBOSE(demuxer, "using bullshit libavformat PCM seek hack\n");
double pts = seek_pts_av / (double)AV_TIME_BASE;
seek_pts_av = pts / av_q2d(priv->pcm_seek_hack->time_base);
int64_t align = seek_pts_av % samples;
seek_pts_av -= align;
seek_stream = priv->pcm_seek_hack->index;
}
}
int r = av_seek_frame(priv->avfc, seek_stream, seek_pts_av, avsflags);
if (r < 0 && (avsflags & AVSEEK_FLAG_BACKWARD)) { if (r < 0 && (avsflags & AVSEEK_FLAG_BACKWARD)) {
// When seeking before the beginning of the file, and seeking fails, // When seeking before the beginning of the file, and seeking fails,
// try again without the backwards flag to make it seek to the // try again without the backwards flag to make it seek to the
// beginning. // beginning.
avsflags &= ~AVSEEK_FLAG_BACKWARD; avsflags &= ~AVSEEK_FLAG_BACKWARD;
r = av_seek_frame(priv->avfc, -1, seek_pts_av, avsflags); r = av_seek_frame(priv->avfc, seek_stream, seek_pts_av, avsflags);
} }
if (r < 0) { if (r < 0) {