mirror of
https://github.com/mpv-player/mpv
synced 2025-01-07 15:40:43 +00:00
80ab4a31ae
git-svn-id: svn://svn.mplayerhq.hu/mplayer/trunk@879 b3059339-0415-0410-9bf9-f77b7e298cf2
236 lines
11 KiB
Plaintext
236 lines
11 KiB
Plaintext
So, I'll describe how this stuff works.
|
|
|
|
The main modules:
|
|
|
|
1. streamer.c: this is the input layer, this reads the file or the VCD or
|
|
stdin. what it has to know: appropriate buffering by sector, seek, skip
|
|
functions, reading by bytes, or blocks with any size. The stream_t
|
|
structure describes the input stream, file/device.
|
|
|
|
2. demuxer.c: this does the demultiplexing of the input to audio and video
|
|
channels, and their reading by buffered packages.
|
|
The demuxer.c is basically a framework, which is the same for all the
|
|
input formats, and there are parsers for each of them (mpeg-es,
|
|
mpeg-ps, avi, avi-ni, asf), these are in the demux_*.c files.
|
|
The structure is the demuxer_t. There is only one demuxer.
|
|
|
|
2.a. demux_packet_t, that is DP.
|
|
Contains one chunk (avi) or packet (asf,mpg). They are stored in memory as
|
|
in chained list, cause of their different size.
|
|
|
|
2.b. demuxer stream, that is DS.
|
|
Struct: demux_stream_t
|
|
Every channel (a/v) has one. This contains the packets for the stream
|
|
(see 2.a). For now, there can be 3 for each demuxer :
|
|
- audio (d_audio)
|
|
- video (d_video)
|
|
- DVD subtitle (d_dvdsub)
|
|
|
|
2.c. stream header. There are 2 types (for now): sh_audio_t and sh_video_t
|
|
This contains every parameter essential for decoding, such as input/output
|
|
buffers, chosen codec, fps, etc. There are each for every stream in
|
|
the file. At least one for video, if sound is present then another,
|
|
but if there are more, then there'll be one structure for each.
|
|
These are filled according to the header (avi/asf), or demux_mpg.c
|
|
does it (mpg) if it founds a new stream. If a new stream is found,
|
|
the ====> Found audio/video stream: <id> messages is displayed.
|
|
|
|
The chosen stream header and its demuxer are connected together
|
|
(ds->sh and sh->ds) to simplify the usage. So it's enough to pass the
|
|
ds or the sh, depending on the function.
|
|
|
|
For example: we have an asf file, 6 streams inside it, 1 audio, 5
|
|
video. During the reading of the header, 6 sh structs are created, 1
|
|
audio and 5 video. When it starts reading the packet, it chooses the
|
|
stream for the first found audio & video packet, and sets the sh
|
|
pointers of d_audio and d_video according to them. So later it reads
|
|
only these streams. Of course the user can force choosing a specific
|
|
stream with
|
|
-vid and -aid switches.
|
|
A good example for this is the DVD, where the english stream is not
|
|
always the first, so every VOB has different language :)
|
|
That's when we have to use for example the -aid 128 switch.
|
|
|
|
Now, how this reading works?
|
|
- demuxer.c/demux_read_data() is called, it gets how many bytes,
|
|
and where (memory address), would we like to read, and from which
|
|
DS. The codecs call this.
|
|
- this checks if the given DS's buffer contains something, if so, it
|
|
reads from there as much as needed. If there isn't enough, it calls
|
|
ds_fill_buffer(), which:
|
|
- checks if the given DS has buffered packages (DP's), if so, it moves
|
|
the oldest to the buffer, and reads on. If the list is empty, it
|
|
calls demux_fill_buffer() :
|
|
- this calls the parser for the input format, which reads the file
|
|
onward, and moves the found packages to their buffers.
|
|
Well it we'd like an audio package, but only a bunch of video
|
|
packages are available, then sooner or later the:
|
|
DEMUXER: Too many (%d in %d bytes) audio packets in the buffer
|
|
error shows up.
|
|
|
|
So everything is ok 'till now, I want to move them to a separate lib.
|
|
|
|
Now, go on:
|
|
|
|
3. mplayer.c - ooh, he's the boss :)
|
|
The timing is solved odd, since it has/recommended to be done differently
|
|
for each of the formats, and sometimes can be done in many ways.
|
|
|
|
There are float variables called a_frame and v_frame, they store
|
|
the just played A/V position in seconds.
|
|
|
|
The structure of the playing loop :
|
|
while(not EOF) {
|
|
fill audio buffer (read & decode audio) + increase a_frame
|
|
read & decode a single video frame + increase v_frame
|
|
sleep (wait until a_frame>=v_frame)
|
|
display the frame
|
|
apply A-V PTS correction to a_frame
|
|
check for keys -> pause,seek,...
|
|
}
|
|
|
|
When playing (a/v), it increases the variables by the duration of the
|
|
played a/v.
|
|
- with audio this is played bytes / sh_audio->o_bps
|
|
Note: i_bps = number of compressed bytes for one second of audio
|
|
o_bps = number of uncompressed bytes for one second of audio
|
|
(this is = bps*samplerate*channels)
|
|
- with video this is usually == 1.0/fps, but I have to note that
|
|
fps doesn't really matters at video, for example asf doesn't have that,
|
|
instead there is "duration" and it can change per frame.
|
|
MPEG2 has "repeat_count" which delays the frame by 1-2.5 ...
|
|
Maybe only AVI and MPEG1 has fixed fps.
|
|
|
|
So everything works right until the audio and video are in perfect
|
|
synchronity, since the audio goes, it gives the timing, and if the
|
|
time of a frame passed, the next frame is displayed.
|
|
But what if these two aren't synchronized in the input file?
|
|
PTS correction kicks in. The input demuxers read the PTS (presentation
|
|
timestamp) of the packages, and with it we can see if the streams
|
|
are synchronized. Then MPlayer can correct the a_frame, within
|
|
a given maximal bounder (see -mc option). The summary of the
|
|
corrections can be found in c_total .
|
|
|
|
Of course this is not everything, several things suck.
|
|
For example the soundcards delay, which has to be corrected by
|
|
MPlayer! The audio delay is the sum of all these:
|
|
- bytes read since the last timestamp:
|
|
t1 = d_audio->pts_bytes/sh_audio->i_bps
|
|
- if Win32/ACM then the bytes stored in audio input buffer
|
|
t2 = a_in_buffer_len/sh_audio->i_bps
|
|
- uncompressed bytes in audio out buffer
|
|
t3 = a_buffer_len/sh_audio->o_bps
|
|
- not yet played bytes stored in the soundcard's (or DMA's) buffer
|
|
t4 = get_audio_delay()/sh_audio->o_bps
|
|
|
|
From this we can calculate what PTS we need for the just played
|
|
audio, then after we compare this with the video's PTS, we have
|
|
the difference!
|
|
|
|
Life didn't get simpler with AVI. There's the "official" timing
|
|
method, the BPS-based, so the header contains how many compressed
|
|
audio bytes belong to one second of frames.
|
|
Of course this doesn't always work... why it should :)
|
|
So I emulate the MPEG's PTS/sector method on AVI, that is the
|
|
AVI parser calculates a fake PTS for every read chunk, decided by
|
|
the type of the frames. This is how my timing is done. And sometimes
|
|
this works better.
|
|
|
|
In AVI, usually there is a bigger piece of audio stored first, then
|
|
comes the video. This needs to be calculated into the delay, this is
|
|
called "Initial PTS delay".
|
|
Of course there are 2 of them, one is stored in the header and not
|
|
really used :) the other isn't stored anywhere, this can only be
|
|
measured...
|
|
|
|
3.a. audio playback:
|
|
Some words on audio playback:
|
|
Not the playing is hard, but:
|
|
1. knowing when to write into the buffer, without blocking
|
|
2. knowing how much was played of what we wrote into
|
|
The first is needed for audio decoding, and to keep the buffer
|
|
full (so the audio will never skip). And the second is needed for
|
|
correct timing, because some soundcards delay even 3-7 seconds,
|
|
which can't be forgotten about.
|
|
To solve this, the OSS gives several possibilities:
|
|
- ioctl(SNDCTL_DSP_GETODELAY): tells how many unplayed bytes are in
|
|
the soundcard's buffer -> perfect for timing, but not all drivers
|
|
support it :(
|
|
- ioctl(SNDCTL_DSP_GETOSPACE): tells how much can we write into the
|
|
soundcard's buffer, without blocking. If the driver doesn't
|
|
support GETODELAY, we can use this to know how much the delay is.
|
|
- select(): should tell if we can write into the buffer without
|
|
blocking. Unfortunately it doesn't say how much we could :((
|
|
Also, doesn't/badly works with some drivers.
|
|
Only used if none of the above works.
|
|
|
|
4. Codecs. They are separate libs.
|
|
For example libac3, libmpeg2, xa/*, alaw.c, opendivx/*, loader, mp3lib.
|
|
mplayer.c calls them if a piece of audio or video needs to be played.
|
|
(see the beginning of 3.)
|
|
And they call the appropriate demuxer, to get the compressed data.
|
|
(see 2.)
|
|
We have to pass the appropriate stream header as parameter (sh_audio/
|
|
sh_video), this should contain all the needed info for decoding
|
|
(the demuxer too: sh->ds).
|
|
The codecs' seprating is underway, the audio is already done, the video is
|
|
work-in-progress. The aim is that mplayer.c won't have to know
|
|
which are the codecs and how to use 'em, instead it should call
|
|
an init/decode audio/video function.
|
|
|
|
5. libvo: this displays the frame.
|
|
The constants for different pixelformats are defined in img_format.h,
|
|
their usage is mandatory.
|
|
|
|
Each vo driver _has_ to implement these:
|
|
|
|
query_format() - queries if a given pixelformat is supported.
|
|
return value: flags:
|
|
0x1 - supported (by hardware or conversion)
|
|
0x2 - supported (by hardware, without conversion)
|
|
0x4 - sub/osd supported (has draw_alpha)
|
|
IMPORTANT: it's mandatorial that every vo driver support the YV12 format,
|
|
and one (or both) of BGR15 and BGR24, with conversion, if needed.
|
|
If these aren't supported, not every codec will work! The mpeg codecs
|
|
can output only YV12, and the older win32 DLLs only 15 and 24bpp.
|
|
There is a fast MMX-using 15->16bpp converter, so it's not a
|
|
significant speed-decrease!
|
|
|
|
The BPP table, if the driver can't change bpp:
|
|
current bpp has to accept these
|
|
15 15
|
|
16 15,16
|
|
24 24
|
|
24,32 24,32
|
|
|
|
If it can change bpp (for example DGA 2, fbdev, svgalib), then if possible
|
|
we have to change to the desired bpp. If the hardware doesn't support,
|
|
we have to change to the one closest to it, and do conversion!
|
|
|
|
init() - this is called before displaying of the first frame -
|
|
initializing buffers, etc.
|
|
|
|
draw_slice(): this displays YV12 pictures (3 planes, one full sized that
|
|
contains brightness (Y), and 2 quarter-sized which the colour-info
|
|
(U,V). MPEG codecs (libmpeg2, opendivx) use this. This doesn't have
|
|
to display the whole frame, only update small parts of it.
|
|
|
|
draw_frame(): this is the older interface, this displays only complete
|
|
frames, and can do only packed format (YUY2, RGB/BGR).
|
|
Win32 codecs use this (DivX, Indeo, etc).
|
|
|
|
draw_alpha(): this displays subtitles and OSD.
|
|
It's a bit tricky to use it, since it's not a part of libvo API,
|
|
but a callback-style stuff. The flip_page() has to call
|
|
vo_draw_text(), so that it passes the size of the screen and the
|
|
corresponding draw_alpha() implementation for the pixelformat
|
|
(function pointer). The vo_draw_text() checks the characters to draw,
|
|
and calls draw_alpha() for each.
|
|
As a help, osd.c contains draw_alpha for each pixelformats, use this
|
|
if possible!
|
|
|
|
flip_page(): this is called after each frame, this diplays the buffer for
|
|
real. This is 'swapbuffers' when double-buffering.
|
|
|
|
|