demux_lavf: implement bad hack for backward playback of wav

This commit generally fixes backward playing in wav, at least in most PCM cases. libavformat's wav demuxer (and actually all other raw PCM based demuxers) have a specific behavior that breaks backward demuxing. The same thing also breaks persistent seek ranges in the demuxer cache, although that's less critical (it just means some cached data gets discarded). The backward demuxing issue is fatal, will log the message "Demuxer not cooperating.", and then typically stop doing anything. Unlike modern media formats, these formats don't organize media data in packets, but just wrap a monolithic byte stream that is described by a header. This is good enough for PCM, which uses fixed frames (a single sample for all audio channels), and for which it would be too expensive to have per frame headers. libavformat (and mpv) is heavily packet based, and using a single packet for each PCM frame causes too much overhead. So they typically "bundle" multiple frames into a single packet. This packet size is obviously arbitrary, and in libavformat's case hardcoded in its source code. The problem is that seeking doesn't respect this arbitrary packet boundary. Seeking is sample accurate. You can essentially seek inside a packet. The resulting packets will not be aligned with previously demuxed packets. This is normally OK. Backward seeking (and some other demuxer layer features) expect that demuxing an earlier demuxed file position eventually results in the same packets, regardless of the seeks that were done to get there. I like to call this "deterministic" demuxing. Backward demuxing in particular requires this to avoid overlaps, which would make it rather hard to get continuous output. Fix this issue by detecting wav and hopefully other raw audio formats with a heuristic (even PCM needs to be detected as heuristic). Then, if a seek is requested, align the seek timestamps on the guessed number of samples in the audio packets returned by the demuxer. The heuristic excludes files with multiple streams. (Except "attachment" video streams, which could be an ID3 tag. Yes, FFmpeg allows ID3 tags on WAV files.) Such files will inherently use the packet concept in some way. We don't know how the demuxer chooses the internal packet size, but we assume that it's fixed and aligned to PCM frame sizes. The frame size is most likely given by block_align (the native wav frame size, according to Microsoft). We possibly need to explicitly read and discard a packet if the seek is done without reading anything before that. We ignore any subsequent packet sizes; we need to avoid the very last packet, which likely has a different size. This hack should be rather benign. In the worst case, it will "round" the seek target a little, but the maximum rounding amount is bounded. Maybe we _could_ round up if SEEK_FORWARD is specified, but I didn't bother. An earlier commit fixed the same issue for mpv's demux_raw. An alternative, and probably much better solution would be clipping decoded data by timestamp. demux.c could allow the type of overlap the wav demuxer introduces, and instruct the decoder to clip the output against the last decoded timestamp. There's already an infrastructure for this (demux_packet.end field) used by EDL/ordered chapters. Although this sounds like a good solution, mpv unfortunately uses floats for timestamps. The rounding errors break sample accuracy. Even if you used integers, you'd need a timebase that is sample accurate (not always easy, since EDL can merge tracks with different sample rates).
2019-05-25 16:59:20 +02:00 · 2019-05-25 16:59:20 +02:00 · 204a7725de
parent f24ff0e948
commit 204a7725de
2 changed files with 67 additions and 15 deletions
--- a/DOCS/man/options.rst
+++ b/DOCS/man/options.rst
@ -445,7 +445,9 @@ Playback Control
      their behavior. There is no list, and the player usually does not detect
      them. Certain live streams (including TV captures) may exhibit problems
      in particular, as well as some lossy audio codecs. h264 intra-refresh is
-      known not to work due to problems with libavcodec.
+      known not to work due to problems with libavcodec. WAV and some other raw
      audio formats tend to have problems - there are hacks for dealing with
      them, which may or may not work.
    - Function with EDL/mkv ordered chapters is obviously broken.
@ -478,16 +480,6 @@ Playback Control
      framestep commands are transposed. Backstepping will perform very
      expensive work to step forward by 1 frame.
    - Backward playback in wav files does not work properly (and possibly
      similar formats, typically raw audio formats used through libavformat).
      This is because libavformat does not align seeks on the packet sizes it
      uses. (The packet sizes are arbitrary and chosen by libavformat
      internally. Seeks on the other hand are sample-exact, which leads to
      overlapping packets if the backward playback state machine seeks back.
      This is very complex to work around, so it doesn't attempt to.)
      A workaround is to remux to a format like mkv, which enforces packet
      boundaries. Making mpv cache the entire file in memory also works.
    - Backward playback with Vorbis does not work. libavcodec's decoder
      discards the first Vorbis packet (after each decoder reset), and the
      mechanism behind ``--audio-reversal-buffer`` assumes that it strictly
--- a/demux/demux_lavf.c
+++ b/demux/demux_lavf.c
@ -140,6 +140,7 @@ struct format_hack {
    bool fix_editlists : 1;
    bool is_network : 1;
    bool no_seek : 1;
    bool no_pcm_seek : 1;
 };
 #define BLACKLIST(fmt) {fmt, .ignore = true}
@ -161,8 +162,8 @@ static const struct format_hack format_hacks[] = {
    {"mpeg", .use_stream_ids = true},
    {"mpegts", .use_stream_ids = true},
-    {"mp4", .skipinfo = true, .fix_editlists = true},
+    {"mp4", .skipinfo = true, .fix_editlists = true, .no_pcm_seek = true},
-    {"matroska", .skipinfo = true},
+    {"matroska", .skipinfo = true, .no_pcm_seek = true},
    {"v4l2", .no_seek = true},
@ -217,6 +218,10 @@ typedef struct lavf_priv {
    struct demux_lavf_opts *opts;
    double mf_fps;
    bool pcm_seek_hack_disabled;
    AVStream *pcm_seek_hack;
    int pcm_seek_hack_packet_size;
    // Proxying nested streams.
    struct nested_stream *nested;
    int num_nested;
@ -677,6 +682,12 @@ static void handle_new_stream(demuxer_t *demuxer, int i)
            }
        }
        if (!sh->attached_picture) {
            // A real video stream probably means it's a packet based format.
            priv->pcm_seek_hack_disabled = true;
            priv->pcm_seek_hack = NULL;
        }
        sh->codec->disp_w = codec->width;
        sh->codec->disp_h = codec->height;
        if (st->avg_frame_rate.num)
@ -785,6 +796,25 @@ static void handle_new_stream(demuxer_t *demuxer, int i)
        sh->missing_timestamps = !!(priv->avif_flags & AVFMT_NOTIMESTAMPS);
        mp_tags_copy_from_av_dictionary(sh->tags, st->metadata);
        demux_add_sh_stream(demuxer, sh);
        // Unfortunately, there is no better way to detect PCM codecs, other
        // than listing them all manually. (Or other "frameless" codecs. Or
        // rather, codecs with frames so small libavformat will put multiple of
        // them into a single packet, but not preserve these artificial packet
        // boundaries on seeking.)
        if (sh->codec->codec && strncmp(sh->codec->codec, "pcm_", 4) == 0 &&
            codec->block_align && !priv->pcm_seek_hack_disabled &&
            priv->opts->hacks && !priv->format_hack.no_pcm_seek &&
            st->time_base.num == 1 && st->time_base.den == codec->sample_rate)
        {
            if (priv->pcm_seek_hack) {
                // More than 1 audio stream => usually doesn't apply.
                priv->pcm_seek_hack_disabled = true;
                priv->pcm_seek_hack = NULL;
            } else {
                priv->pcm_seek_hack = st;
            }
        }
    }
    select_tracks(demuxer, i);
@ -1112,6 +1142,9 @@ static bool demux_lavf_read_packet(struct demuxer *demux,
        return true;
    }
    if (priv->pcm_seek_hack == st && !priv->pcm_seek_hack_packet_size)
        priv->pcm_seek_hack_packet_size = pkt->size;
    if (pkt->pts != AV_NOPTS_VALUE)
        dp->pts = pkt->pts * av_q2d(st->time_base);
    if (pkt->dts != AV_NOPTS_VALUE)
@ -1139,6 +1172,7 @@ static void demux_seek_lavf(demuxer_t *demuxer, double seek_pts, int flags)
    lavf_priv_t *priv = demuxer->priv;
    int avsflags = 0;
    int64_t seek_pts_av = 0;
    int seek_stream = -1;
    if (priv->optical_crap_hack) {
        if (flags & SEEK_FACTOR)
@ -1169,13 +1203,39 @@ static void demux_seek_lavf(demuxer_t *demuxer, double seek_pts, int flags)
        seek_pts_av = seek_pts * AV_TIME_BASE;
    }
-    int r = av_seek_frame(priv->avfc, -1, seek_pts_av, avsflags);
+    // Hack to make wav seeking "deterministic". Without this, features like
    // backward playback won't work.
    if (priv->pcm_seek_hack && !priv->pcm_seek_hack_packet_size) {
        // This might for example be the initial seek. Fuck it up like the
        // bullshit it is.
        AVPacket pkt = {0};
        if (av_read_frame(priv->avfc, &pkt) >= 0)
            priv->pcm_seek_hack_packet_size = pkt.size;
        av_packet_unref(&pkt);
        add_new_streams(demuxer);
    }
    if (priv->pcm_seek_hack && priv->pcm_seek_hack_packet_size &&
        !(avsflags & AVSEEK_FLAG_BYTE))
    {
        int samples = priv->pcm_seek_hack_packet_size /
                      priv->pcm_seek_hack->codecpar->block_align;
        if (samples > 0) {
            MP_VERBOSE(demuxer, "using bullshit libavformat PCM seek hack\n");
            double pts = seek_pts_av / (double)AV_TIME_BASE;
            seek_pts_av = pts / av_q2d(priv->pcm_seek_hack->time_base);
            int64_t align = seek_pts_av % samples;
            seek_pts_av -= align;
            seek_stream = priv->pcm_seek_hack->index;
        }
    }
    int r = av_seek_frame(priv->avfc, seek_stream, seek_pts_av, avsflags);
    if (r < 0 && (avsflags & AVSEEK_FLAG_BACKWARD)) {
        // When seeking before the beginning of the file, and seeking fails,
        // try again without the backwards flag to make it seek to the
        // beginning.
        avsflags &= ~AVSEEK_FLAG_BACKWARD;
-        r = av_seek_frame(priv->avfc, -1, seek_pts_av, avsflags);
+        r = av_seek_frame(priv->avfc, seek_stream, seek_pts_av, avsflags);
    }
    if (r < 0) {