mpv/DOCS/tech/general.txt

So, I'll describe how this stuff works.

The main modules:

1. streamer.c: this is the input layer, this reads the file or the VCD or
   stdin.  what it has to know: appropriate buffering by sector, seek, skip
   functions, reading by bytes, or blocks with any size.  The stream_t
   structure describes the input stream, file/device.

2. demuxer.c: this does the demultiplexing of the input to audio and video
   channels, and their reading by buffered packages.
	 The demuxer.c is basically a framework, which is the same for all the
	 input formats, and there are parsers for each of them (mpeg-es,
	 mpeg-ps, avi, avi-ni, asf), these are in the demux_*.c files.
	 The structure is the demuxer_t. There is only one demuxer.

2.a. demux_packet_t, that is DP.
   Contains one chunk (avi) or packet (asf,mpg). They are stored in memory as
	 in chained list, cause of their different size.

2.b. demuxer stream, that is DS.
   Struct: demux_stream_t
   Every channel (a/v) has one. This contains the packets for the stream
         (see 2.a). For now, there can be 3 for each demuxer :
	 - audio (d_audio)
	 - video (d_video)
	 - DVD subtitle (d_dvdsub)

2.c. stream header. There are 2 types (for now): sh_audio_t and sh_video_t
   This contains every parameter essential for decoding, such as input/output
	 buffers, chosen codec, fps, etc. There are each for every stream in
	 the file. At least one for video, if sound is present then another,
	 but if there are more, then there'll be one structure for each.
	 These are filled according to the header (avi/asf), or demux_mpg.c
	 does it (mpg) if it founds a new stream. If a new stream is found,
	 the ====> Found audio/video stream: <id>  messages is displayed.

	 The chosen stream header and its demuxer are connected together
	 (ds->sh and sh->ds) to simplify the usage. So it's enough to pass the
	 ds or the sh, depending on the function.

	 For example: we have an asf file, 6 streams inside it, 1 audio, 5
	 video. During the reading of the header, 6 sh structs are created, 1
	 audio and 5 video. When it starts reading the packet, it chooses the
	 stream for the first found audio & video packet, and sets the sh
	 pointers of d_audio and d_video according to them. So later it reads
	 only these streams. Of course the user can force choosing a specific
	 stream with
	 -vid and -aid switches.
	 A good example for this is the DVD, where the english stream is not
	 always the first, so every VOB has different language :)
	 That's when we have to use for example the -aid 128 switch.

  Now, how this reading works?
	 - demuxer.c/demux_read_data() is called, it gets how many bytes,
	   and where (memory address), would we like to read, and from which
           DS. The codecs call this.
	 - this checks if the given DS's buffer contains something, if so, it
	   reads from there as much as needed. If there isn't enough, it calls
	   ds_fill_buffer(), which:
	 - checks if the given DS has buffered packages (DP's), if so, it moves
	   the oldest to the buffer, and reads on. If the list is empty, it
	   calls demux_fill_buffer() :
	 - this calls the parser for the input format, which reads the file
	   onward, and moves the found packages to their buffers.
		 Well it we'd like an audio package, but only a bunch of video
		 packages are available, then sooner or later the:
		 DEMUXER: Too many (%d in %d bytes) audio packets in the buffer
		 error shows up.

So everything is ok 'till now, I want to move them to a separate lib.

Now, go on:

3. mplayer.c - ooh, he's the boss :)
    Its main purpose is connecting the other modules, and maintaining A/V
    sync.

    The given stream's actual position is in the 'timer' field of the
    corresponding stream header (sh_audio / sh_video).

	 The structure of the playing loop :
         while(not EOF) {
             fill audio buffer (read & decode audio) + increase a_frame
             read & decode a single video frame + increase v_frame
             sleep  (wait until a_frame>=v_frame)
             display the frame
             apply A-V PTS correction to a_frame
             check for keys -> pause,seek,...
         }

	 When playing (a/v), it increases the variables by the duration of the
	 played a/v.
	 - with audio this is played bytes / sh_audio->o_bps
	 Note: i_bps = number of compressed bytes for one second of audio
	       o_bps = number of uncompressed bytes for one second of audio
		   (this is = bps*samplerate*channels)
	 - with video this is usually == 1.0/fps, but I have to note that
	 fps doesn't really matters at video, for example asf doesn't have that,
	 instead there is "duration" and it can change per frame.
	 MPEG2 has "repeat_count" which delays the frame by 1-2.5 ...
	 Maybe only AVI and MPEG1 has fixed fps.

	 So everything works right until the audio and video are in perfect
	 synchronity, since the audio goes, it gives the timing, and if the
	 time of a frame passed, the next frame is displayed.
	 But what if these two aren't synchronized in the input file?
	 PTS correction kicks in. The input demuxers read the PTS (presentation
	 timestamp) of the packages, and with it we can see if the streams
	 are synchronized. Then MPlayer can correct the a_frame, within
	 a given maximal bounder (see -mc option). The summary of the
	 corrections can be found in c_total .

	 Of course this is not everything, several things suck.
	 For example the soundcards delay, which has to be corrected by
	 MPlayer! The audio delay is the sum of all these:
	 - bytes read since the last timestamp:
	   t1 = d_audio->pts_bytes/sh_audio->i_bps
	 - if Win32/ACM then the bytes stored in audio input buffer
	   t2 = a_in_buffer_len/sh_audio->i_bps
	 - uncompressed bytes in audio out buffer
	   t3 = a_buffer_len/sh_audio->o_bps
	 - not yet played bytes stored in the soundcard's (or DMA's) buffer
	   t4 = get_audio_delay()/sh_audio->o_bps

	 From this we can calculate what PTS we need for the just played
	 audio, then after we compare this with the video's PTS, we have
	 the difference!

	 Life didn't get simpler with AVI. There's the "official" timing
	 method, the BPS-based, so the header contains how many compressed
	 audio bytes or chunks belong to one second of frames.
	 In the AVI stream header there are 2 important fields, the
	 dwSampleSize, and dwRate/dwScale pairs:
	 - If the dwSampleSize is 0, then it's VBR stream, so its bitrate
	 isn't constant. It means that 1 chunk stores 1 sample, and
	 dwRate/dwScale gives the chunks/sec value.
	 - If the dwSampleSize is >0, then it's constant bitrate, and the
	 time can be measured this way:  time = (bytepos/dwSampleSize) /
	 (dwRate/dwScale) (so the sample's number is divided with the
	 samplerate). Now the audio can be handled as a stream, which can
	 be cut to chunks, but can be one chunk also.

	 The other method can be used only for interleaved files: from
	 the order of the chunks, a timestamp (PTS) value can be calculated.
	 The PTS of the video chunks are simple: chunk number * fps
	 The audio is the same as the previous video chunk was.
	 We have to pay attention to the so called "audio preload", that is,
	 there is a delay between the audio and video streams. This is
	 usually 0.5-1.0 sec, but can be totally different.
	 The exact value was measured until now, but now the demux_avi.c
	 handles it: at the audio chunk after the first video, it calculates
	 the A/V difference, and take this as a measure for audio preload.

3.a. audio playback:
	 Some words on audio playback:
	 Not the playing is hard, but:
	 1. knowing when to write into the buffer, without blocking
	 2. knowing how much was played of what we wrote into
	 The first is needed for audio decoding, and to keep the buffer
	 full (so the audio will never skip). And the second is needed for
	 correct timing, because some soundcards delay even 3-7 seconds,
	 which can't be forgotten about.
	 To solve this, the OSS gives several possibilities:
	 - ioctl(SNDCTL_DSP_GETODELAY): tells how many unplayed bytes are in
	   the soundcard's buffer -> perfect for timing, but not all drivers
	   support it :(
	 - ioctl(SNDCTL_DSP_GETOSPACE): tells how much can we write into the
	   soundcard's buffer, without blocking. If the driver doesn't
	   support GETODELAY, we can use this to know how much the delay is.
	 - select(): should tell if we can write into the buffer without
	   blocking. Unfortunately it doesn't say how much we could :((
	   Also, doesn't/badly works with some drivers.
	   Only used if none of the above works.

4. Codecs. They are separate libs.
   For example libac3, libmpeg2, xa/*, alaw.c, opendivx/*, loader, mp3lib.

   mplayer.c doesn't call the directly, but through the dec_audio.c and
   dec_video.c files, so the mplayer.c doesn't have to know anything about
   the codec.

5. libvo: this displays the frame.
   The constants for different pixelformats are defined in img_format.h,
	 their usage is mandatory.

   Each vo driver _has_ to implement these:

   query_format()  -  queries if a given pixelformat is supported.
		      return value:   flags:
			0x1  -  supported (by hardware or conversion)
			0x2  -  supported (by hardware, without conversion)
			0x4  -  sub/osd supported (has draw_alpha)
   IMPORTANT: it's mandatorial that every vo driver support the YV12 format,
	 and one (or both) of BGR15 and BGR24, with conversion, if needed.
	 If these aren't supported, not every codec will work! The mpeg codecs
	 can output only YV12, and the older win32 DLLs only 15 and 24bpp.
	 There is a fast MMX-optimized 15->16bpp converter, so it's not a
	 significant speed-decrease!

    The BPP table, if the driver can't change bpp:
	 current bpp		has to accept these
	     15				15
	     16				15,16
	     24				24
	     24,32			24,32

    If it can change bpp (for example DGA 2, fbdev, svgalib), then if possible
	 we have to change to the desired bpp. If the hardware doesn't support,
	 we have to change to the one closest to it, and do conversion!

    init()  -  this is called before displaying of the first frame -
	 initializing buffers, etc.

    draw_slice(): this displays YV12 pictures (3 planes, one full sized that
	 contains brightness (Y), and 2 quarter-sized which the colour-info
	 (U,V). MPEG codecs (libmpeg2, opendivx) use this. This doesn't have
	 to display the whole frame, only update small parts of it.

    draw_frame(): this is the older interface, this displays only complete
	 frames, and can do only packed format (YUY2, RGB/BGR).
	 Win32 codecs use this (DivX, Indeo, etc).

    draw_alpha(): this displays subtitles and OSD.
	 It's a bit tricky to use it, since it's not a part of libvo API,
	 but a callback-style stuff. The flip_page() has to call
	 vo_draw_text(), so that it passes the size of the screen and the
	 corresponding draw_alpha() implementation for the pixelformat
	 (function pointer). The vo_draw_text() checks the characters to draw,
	 and calls draw_alpha() for each.
	 As a help, osd.c contains draw_alpha for each pixelformats, use this
	 if possible!

    flip_page(): this is called after each frame, this diplays the buffer for
	 real. This is 'swapbuffers' when double-buffering.

6. libao2: this control audio playing

  As in libvo (see 5.) also here are some drivers, based on the same API:

static int control(int cmd, int arg);
  This is for reading/setting driver-specific and other special parameters.
  Not really used for now.

static int init(int rate,int channels,int format,int flags);
  The init of driver, opens device, sets sample rate, channels, sample format
  parameters.
  Sample format: usually AFMT_S16_LE or AFMT_U8, for more definitions see
  dec_audio.c and linux/soundcards.h files!

static void uninit();
  Guess what.
  Ok I help: closes the device, not (yet) called when exit.

static void reset();
  Resets device. To be exact, it's for deleting buffers' contents,
  so after reset() the previously received stuff won't be output.
  (called if pause or seek)

static int get_space();
  Returns how many bytes can be written into the audio buffer without
  blocking (making caller process wait). If the buffer is (nearly) full,
  has to return 0!
  If it never gives 0, MPlayer won't work!

static int play(void* data,int len,int flags);
  Plays a bit of audio, which is received throught the "data" memory area, with
  a size of "len". The "flags" isn't used yet. It has to copy the data, because
  they can be overwritten after the call is made. Doesn't really have to use
  all the bytes, it has to give back how many have been used (copied to
  buffer).

static int get_delay();
  Has to return how many bytes are in the audio buffer. Be exact, if possible,
  since the whole timing depends on this! In the worst case, return the size
  of the buffer.

!!! Because the video is synchronized to the audio (card), it's very important
!!! that the get_space and get_delay functions be correctly implemented!