//First Announce by Ivan Kalvachev
//Some explanations by Arpi & Pontscho 

If you have any suggestion related to the subjects in this document you
could send them to mplayer developer or advanced users mail lists. If you are
developer and have CVS access do not delete parts of this document, but you
could feel free to add paragraphs that you will sign with your name. 
Be warned that the text could be changed, modified and your name could be 
moved at the top of the document. 

1.libvo2 drivers 
1.1 functions
Currently these functions are implemented:
  init
  control
  start
  stop
  get_surface
  update_surface - renamed draw
  show_surface - renamed flip_page
  query
  hw_decode
  subpicture

Here is detailed description of the functions:
  init - initialisation. It is called once on mplayer start
  control - this function is message oriented interface for controlling the libvo2 driver
  start - sets given mode and display it on the screen
  stop - closes libvo2 driver, after stop we may call start again
  query - the negotiation is more complex than just finding which imgfmt the
  device could show, we must have list of capabilities, etc.
  This function will have at least 3 modes:
    a) return list with description of available modes.
    b) check could we use this mode with these parameters. E.g. if we want 
       RGB32 with 3 surfaces for windows image 800x600 we may get out of video
       memory. We don't want error because this mode could be used with 2
       surfaces.
    c) return supported subpicture formats if any.
   +d) supported functionality by hw_decode

As you may see I have removed some functionality from control() and made
separate function. Why? It is generally good thing functions that are
critical to the driver to have it's own implementation.
  get_surface - this function give us surfaces where we could write. In most
    cases this is video memory, but it is possible to be and computer RAM, with
    some special meaning (AGP memory , X shared memory, GL texture ...).

  update_surface - as in the note above, this is draw function. Why I change
    it's name? I have 2 reasons, first I don't want implementation like vo1,
    second it really must update video surface, it must directly call the
    system function that will do it. This function should work only with
    slices, the size of slice should not be limited and should be passed 
    (e.g ystart, yend), if we want draw function, we will call one form libvo2
    core, that will call this one with ystart=0; yend=Ymax;. Also some system
    screen update functions wait for vertical retrace before return, other
    functions just can't handle partial updates. In this case we should inform
    libvo2 core that device cannot slice, and libvo2 core must take care of
    the additional buffering and update_surface becomes usual draw function.
    When update_surface() is used with combination on get_surface(), ONLY VALID
    POINTERS ARE THESE RETURNED BY get_surface(). Watch out with cropping.

  show_surface - this functions is always called on frame change. it is used
    to show the given surface on the screen.
    If there is only one surface then it is always visible and this function 
    does nothing.

  hw_decode - to make all dvb,dxr3, TV etc. developers happy. This function
    is for you. Be careful, don't OBSEBE it, think and for the future, this
    function should have and ability to control HW IDCT, MC that one day will
    be supported and under linux. Be careful:)

  subpicture - this function will place subtitles. It must be called once to
    place them and once to remove them, it should not be called on every
    frame, the driver will take care of this.  Currently I propose this
    implementation: we get array of bitmaps. Each one have its own starting
    x, y and it's own height and width, each one (or all together) could be
    in specific imgfmt (spfmt). THE BITMAPS SHOULD NOT OVERLAP! This may not
    be hw limitation but sw subtitles may get confused if they work as 'c'
    filter (look my libvo2 core). Anyway, so far I don't know hardware that
    have such limitations, but it is safer to be so (and faster I think).
    It is generally good to merge small bitmaps (like characters) in larger
    ones and make all subtitles as one bitmap( or one bitmap for one subtitle line). 
    There will be and one for each OSD time & seek/brightness/contrast/volume bar.
    
1.2 control()
OK, here is list of some control()s that I think that could be useful:
    SET_ASPECT
    SET_SCALLE_X, SET_SIZE_X
    SET_SCALLE_Y, SET_SIZE_Y
    RESET_SIZE
    GET/SET_POSITION_X
    GET/SET_POSTIION_Y
    GET/SET_RESOLUTION
    GET/SET_DISPLAY
    GET/SET_ATTRIBUTES
+   GET/SET_WIN_DECORATION

Here is description of how these controls to be used:

  SET_ASPECT - this is the move/video aspect, why not calculate it in
  different place (mplayer.c) and pass the results to driver by
  set_size_x/y. First this is only if hardware could scale. Second we may
  need this value if we have TV and we won't calculate any new height and
  width.

  SET_SCALLE_X/Y - this is to enlarge/downscale  the image, it WILL NOT
  override SET_ASPECT, they will have cumulative effect, this could be used
  for deinterlacing (HALF SIZE). Second if we want to zoom 200% we don't
  want to lose aspect calculations. Or better SET_SCALLE to work with
  current size?

  SET_SIZE_X/Y - This is for custom enlarge, to save some scale calculation
  and for more precise results.

  RESET_SIZE - Set the original size of image, we must call SET_ASPECT again.

  GET/SET_POSOTION_X/Y - This if for windows only, to allow custom move on
  window.

  GET/SET_RESOLUTION - change resolution and/or bpp if possible. To be used
  for changing desktop resolution or the resolution of the current
  fullscreen mode (NOT TO SET IT just to change it if we don't like it)

  GET/SET_DISPLAY - mainly for X11 and remote displays. Not very useful, but
  may be handy.

  GET/SET_ATTRIBUTES - Xv overlays have contrast, brightness, hue,
  saturation etc. these and others could be controlled by this. If we want
  to query it we must call GET_*, and the to check does our attribute is in
  there (xv developers be careful, 2 or 3 of default attributes sometimes
  are not queried by X, but could be set).

Do you think that TV encoding (NTSC,PAL,SECAM) should have it's own attribute?
I would like to hear the GUI developers. Could we separate Mouse/Keyboard
from the driver. What info do you need to do it. Don't forget that SDL have
it's own keyboard/mouse interface. Maybe we should allow video driver to
change the libin driver ?

<SOP>
Arpi wrote:
I've asked Pontscho (he doesn't understand english well...).
There is 2 option of GUI<->mplayer interface.

The current, ugly (IMHO) way:
gui have the control of the video window, it does handle resizing, moving,
key events etc. all window manipulation in libvo drivers are disabled as gui
is enabled. it was required as libvo isn't inited and running when gui
already display the video window.

The wanted way:
GUI shouldn't control the X window directly, it should use libvo2 control
calls to resize/move/etc it. But there is a big problem: X cannot be opened
twice from a process. It means GUI and libvo2 should share the X connection.
And, as GUI run first (and when file is selected etc then libvo2 is started)
it should connect to X and later pass the connection to libvo2. It needs an
extra control() call and some extra code in mplayer.c

but this way gui could work with non-X stuff, like SDL, fbdev (on second
head for TVout etc), hardware decoders (dvb.dxr3) etc.

as X is so special, libvo2 should have a core function to open/get an X
connection, and it should be used by all X-based X drivers and gui.

also, GUI needs functions to get mouse and keyboard events, and to
enable/disable window decoration (title, border).

we need fullscreen switch control function too.

> Maybe we should allow video driver to change the libin driver ? 
forget libin. most input stuff is handled by libvo drivers.
think of all X stuff (x11,xv,dga,xmga,gl), SDL, aalib, svgalib.
only a few transparent drivers (fbdev, mga, tdfxfb, vesa) has not, but all
of them are running on console (and maybe on second head) at fullscreen, so
they may not need mouse events. console keyboard events are already catched
and handled by getch2.

I can't see any sense of writing libin.

mpalyer.c should _handle_ all input events, collected from lirc interface,
getch2, libvo2 etc. and it should set update flags, for gui and osd.

but we should share some plugin code. examples: *_vid code, all common X
code. it can be either implementing them in libvo2 core (and called from
plugins) or include these files from all drivers which need it. later method
is a bit cleaner (from viewpoint of core-plugin independency) but results
bigger binaries...
<EOP, Arpi>

Btw. when we finish we will have libin, but it will be spread around mplayer. 
I agree that libin could be build in in libvo2 driver, but there have to be
standart way to send commands to the mplayer itself.


1.3. query()

Here come and some attributes for the queried modes, each supported mode
should have such description. It is even possible to have more than one mode
that could display given imgfmt. I think that we have to separate window from fullscreen
modes and to have yv12 mode for window and yv12 fullscreen mode. We need and naming 
scheme, in order to have *.conf control over modes - to disable buggy modes, to limit
surfaces (buggy ones), to manually disable slices etc. The naming should not change from
one computer to another and have to be flexible.
{
  IMGFMT - image format (RGB,YV12, etc...)

  Height - the height of fullscreen mode or the maximum height of window mode

  Width - the width of fullscreen mode or the maximum withd of window mode

}
{
  Scale y/n  - hardware scale, do you think that we must have one for x and
  one for y (win does)?

  Fullscreen y/n - if the supported mode is fullscreen, if we have yv12 for
  fullscreen and window we must threat them as separate modes.
  
  Window y/n - The mode will show the image in a window. Could be removed as 
  it is mutually exclusive with Fullscreen

  GetSurface y/n - if driver could give us video surface we'll use get_surface()

  UpdateSurfece y/n - if driver will update video surface through sys function (X,SDL)

  HWdecode y/n  - if driver could take advantage of hw_decode()

  MaxSurfaces 1..n - Theoretical maximum of surfaces

  SubPicture y/n - Could we put subpicture (OSD) of any kind by hw

  WriteCombine y/n - if GetSurface==yes, most (or all) pci&agp cards are
  extremely slow on byte access, this is hint to vo2 core those surfaces
  that got affected by WC. Some surfaces are in memory (X shm, OpenGL textures)
  This is only a hint.

  us_clip y/n - if UpdateSurface=yes, this shows could update_surface()
  remove strides (when stride> width ), this is used and for cropping. If
  not, we must do it.

  us_slice y/n - if UpdateSurface=yes, this shows that update_surface()
  could draw slices and that after updating surface,it won't wait for 
  vertical retrace, so we could update surface slice by slice. 
  If us_slice==n we will have to accumulate all slices in some buffer.

  us_upsidedown - if UpdateSufrace=yes, this shows that update_suface()
  could flip the image vertically. In some case this could be combined with
  us_clip /stride tricks/

  switch_resoliton y/n - if window=y, this shows could we switch resolution
  of desktop, if fullscreen==y, shows that we could change resolution, after
  we have set the fullscreen mode.

  deinterlace y/n - indicates that the device could deinterlace on it's own
  (radeon, TV out).
}
1.4 conclusion 

As you see, I have removed all additional buffering from the driver. There
is a lot of functionality that should be checked and handled by libvo2 core.
If some of the functionality is not supported the libvo2 core should add filters
that will support it by software.

Some of the parameters should be able to
 be overridden by user config, mainly 
to disable buggy modes or parameters. I
 believe that this should not be done 
by command line as there are enough
 commands now.

I wait comments and ideas.
//--------------------------------------------------------------------------
2. libvo2 core
2.1 functions
now these function are implemented:
    init
    new
    start
    query_format
    close

and as draw.c:
    choose_buffering
    draw_slice_start
    draw_slice
    draw_frame
    flip

init() is called at mplayer start. internal initialisation.
new() -> rename to open_drv() or something like this.
query_format -> not usable in this form, this function mean that all
  negotiation will be performed outside libvo2. Replace or find better name.
close -> open/close :)

choose_buffering - all buffering must stay hidden. The only exception is for
  hw_decode. In the new implementation this functions is not usable.
  It will be replaced with some kind of negotiation.
draw_slice_start, draw_slice -> if you like it this way, then it's OK. But i think that
draw_slice_done could help.

draw_frame -> classic draw function.

2.2 Minimal buffering

I should say that I stand after the idea all buffering, postprocessing,
format conversion , sw draw of subtitles, etc to be done in libvo2 core.
Why? First this is the only way we could fully control buffering and
decrease it to minimum. Less buffers means less coping. In some cases this
could have the opposite effect (look at direct rendering).

The first step of the analyse is to find out what we need:

DECODER   -   num_out_buffers={1/2/3/...}
              {
                buffer_type:{fixed/static/movable}
                read_only:{yes/no}
              } * (num_out_buffers)
              slice:{not/supported}

FILTER 1..x - processing:{ c-copy(buff1,buff2), p-process(buff1) }, 
              slice:{not/supported}
              write_combine:{not/safe}, 
              runtime_remove:{static/dynamic}

VIDEO_OUT  -  method:{get_surface,update_surface}, 
              slice:{not/supported}, 
              write_combine:{not/safe},
              clip:{can/not},
              upsidedown:(can/not),
              surfaces:{1/2/3,..,n}


I use one letter code for the type of filters. You could find them in filters section.
Details: 

DECODER - We always get buffer from the decoder, some decoders could give
  pointer to it's internal buffers, other takes pointers to buffers where
  they should store the final image. Some decoders could call draw_slice
  after they have finished with some portion of the image.

  num_out_buffers - number of output buffers. Each one could have it's own
  parameters. In the usual case there will be only one buffer. Some
  decoders may have 2 internal buffers like odivx, or like mpeg12 - 3 buffers
  of different types(2 static and 1 temp).

  buffer_type -
    - fixed  - we don't have control where the buffer will be. We could 
    just take pointer to this buffer. No direct rendering is possible.
    - static - we could set this buffer but then we can't change it's position.
    - movable - we could set this buffer to any location at any time.
  read_only - the data in this buffer will be used in future so we must not 
  try to write in there or we'll corrupt the video. If we have any 'p' kind
  of filter we'll make copy.

  slice - this flag shows that decoder knows and want to work with slices.

FILTER - postprocessing, sw drawing subtitles, format conversion, crop,
external filters.

  slice - could this filter work with slice order. We could use slice even
  when decoder does not support slice, we just need 2 or more filters that
  does. This could give us remarkable speed boost.

  processing - some filters can copy the image from one buffer to the other,
  I call them 'c', convert and crop(stride copy) are good examples but don't
  forget simple 1:1 copy. Other filters does process only part if the image,
  and could reuse the given buffer, e.g. putting subtitles. I call them 'p'
  Other filters could work in one buffer, but could work and with 2, I call 
  them 't' class, after analyse they will fade to 'c' or 'p'. 

  runtime_remove - postprocess with autoq. Subtitles appear and disappear,
  should we copy image from one buffer to another if there is no processing
  at all?

//clip, crop, upsidedown - all 'c' filters must support strides, and should
  be able to remove them and to make some tricks like crop and upside_down.

VIDEO_OUT - take a look of libvo2 driver I propose.
  method - If we get surface -'S'. If we use draw* (update_surface) - 'd'

As you may see hw_decode don't have complicated buffering:)

I make the analyse this way. First I put decoder buffer, then I put all
filters, that may be needed, and finally I put video out method. Then I add
temp buffers where needed. This is simple enough to be made on runtime.

2.5 Various
2.5.1 clip&crop - we have x1,y1 that shows how much of the beginning and 
x2,y2 how much of the end we should remove.
    plane+=(x1*sizeof(pixel))+(y1*stride);//let plane point to 1'st visible pixel
    height-=y1+y2;
    width-=x1+x2;
  isn't is simple? no copy just change few variables. In order to make normal
plane we just need to copy it to frame where stide=width;

2.5.2 flip,upsidedown - in windows this is indicated by negative height, here
  in mplayer we may use negative stride, so we must make sure that filters and
  drivers could use negative stride
    plane+=(width-1)*stride;//point to the last line
    stride=-stride;//make stride point to previus line
  and this one is very simple, and I hope that could work with all know image formats

  BE careful,  some modes may pack 2 pixels in 1 byte!
  Other modes (YUYV) require y1 to be multiply of 2.

  stride is always in bytes, while width & height are in pixels

2.5.3 PostProcessing
Arpi was afraid that postprocessing needs more internal data to work. I think
that the quantization table should be passed as additional plane. 
How to be done this? When using Frame structure there is qbase that should point
to quantization table. The only problem is that usually the table is with fixed
size. I expect recommendations how to be properly implemented. should we crop it? Or add qstride, qheight, qwidth? Or mark the size of marcoblocks and
calc table size form image size? Currently pp work with fixed 8x8 blocks.
There may have and problem with interlaced images. 
/ for frame look at 2.3.4 /
I recommend splitting postprocessing to it's original filters and ability to
use them separately.

2.3. Rules for minimal buffering
2.3.1 Direct rendering. 
Direct rendering means that the decoder will use video surface as output buffer. 
  Most of the decoders have internal buffers and on request they copy 
the ready image from one of them to a given location. As we can't get pointer
to the internal buffer the fastest way is to give video surface as 
output buffer and the decoder will draw it for us. This is safe as most of 
copy routines are optimised for double words aligned access.
  If we get internal buffer, we could copy the image on our own. This is not 
direct rendering but it gets the same speed. If fact that's why -vc odivx 
is faster that -vc divx4 while they use the same divx4linux library.
  Sometimes it's possible to set video surface as internal buffer, but in most
cases the decoding process is byte oriented and many unaligned access is 
performed. Moreover, reading from video memory on some cards is extremely 
slow, about 4 times and more (and this is without setting MTRR), but some 
decoders could take advantage of this. In the best case (reading performed
from the cache and using combined write ) we'll watch DivX movie with the same
speed as DivX4 is skipping frames. 

What does we need for Direct Rendering? 
1. We should be able to get video surfaces. 
2. The decoder should have at least one buffer with buffer_type != fixed.
3. If we have 'c' filter we can not use direct rendering. If we have 
   'p' filter we may allow it.
4. If decoder have one static buffer, then we are limited to 1 video surface.
   In this case we may see how the frame is rendered (ugly refresh in best case)
5. Each static buffer and each read_only buffer needs to have it own
   video surface. If we don't have enough ... well we may make some tricks 
   but it is too complicated //using direct rendering for the first in
   the list and the rest will use memory buffering. And we must have (1 or 2 ) free 
   video surfaces for the rest of decoder buffers//
6. Normal (buffer_type=movable, read_only=no) buffer could be redirected to
   any available video surface.

2.3.2 Normal process
  The usual case libvo2 core takes responsibility to move the data. It must
follow these rules:
  1. The 'p' filters process in the buffer of the left, if we have read_only
buffer then vo2 core must insert 'c' copy filter and temp buffer. 
  2. With 'c' filter we must make sure that we have buffer on the right(->) side. I think
that  
  3. In the usual case 't' are replaced with 'p' except when 't' is before video surface.
We must have at least one 'c' if core have to make crop, clip, or flip image
upside down.
  4. Take care for the additional buffering when we have 1 surface (the libvo1 way).
  5. Be aware that some filters must be before other. E.g. Postporcessing should
be before subtitles:)
  6. If we want scale (-zoom), and vo2 driver can't make it then add and scale
filter 'c'. For better understanding I have only one convert filter that can
copy, convert, scale, convert and scale. In mplayer it really will be only
one filter.
  7. If we have video surface then the final 'c' filters will update it for us. If the filter
and video surface are not WriteCombine safe we may add buffering. In case we use both 
get_surface and update_surface, after writing in video surface we must call and
update_sufrace() function. 

If we must update_surface() then we will call it with the last buffer. This buffer could
be and the internal decoder buffer if there are no 'c' filters. This buffer could be 
returned and by get_surface().

2.3.3 Slices.
  Slice is a small rectangle of the image. In decoders world it represents 
  independently rendered portion of the image. In mplayer slice width is equal 
  to the image width, the height is usually 8 but there is no problem to vary. 
  The slices advantage is that working with smaller part of the image the most 
  of data stays in the cache, so post processing would read the data for free. 
  This makes slice processing of video data preferred even when decoder and/or 
  video driver couldn't work with slices.
  Smaller slices increase possibility of data to be in the cache, but also 
  increase the overhead of function calls( brunch prediction too), so it may be 
  good to tune the size, when it is possible (mainly at 2 filter slices)

  Here are some rules:
1. Slices are always with width of the image
2. Slices always are one after another, so you could not skip few lines because 
   they are not changed. This is made for postprocessing filter as there may 
   have different output image based on different neighbourhood lines(slices). 
3. Slice always finish with last line, this is extended of 2. rule.
4. Slice buffers are normal buffers that could contain a whole frame. This is 
   need in case we have to accumulate slices for frame process (draw). This is 
   needed and for pp filters.
5. Slice processing could be used if:
5.1. decoder know for slices and call function when one is completed. The next 
   filter (or video driver) should be able to work with slices.
5.2. Two or more filters could work with slices. Call them one after another. 
   The result will be accumulated in the buffer of the last filter (look down 
   for 'p' type)
5.3. If the final filter can slice and vo2_driver can slice
6. All filers should have independent counters for processed lines. These counters
must be controlled by vo2 core.

2.3.3.1 Slice counters.
For the incoming image we need:
1. value that show the last valid line. 
2. value that show the line from where filter will start working. It is updated by the 
filter to remember what portion of the image is processed. Vo2 core will zero it
on new frame.

For the result image we need:
1. value that show which line is ready. This will be last valid line for next filter.

The filter may need more internal variables. And as it may be used 2 or more times
in one chain it must be reentrant. So that internal variables should be passed to
filter as parameter.

2.3.3.2 Auto slice.
In case we have complete frame that will be processed by few filters that support slices, we must start processing this frame slice by slice. We have same situation 
when one filter accumulates too many lines and forces the next filters to work with bigger slice.
To avoid that case and to automatically start slicing we need to limit the slice size
and when slice is bigger to break it apart. If some filter needs more image lines then
it will wait until it accumulates them.

2.3.4. Frame structure
 So far we have buffer, that contain image, we have filters that work with 
 buffers. For croping and for normal work with the image data we need to know 
 dimensions of the image. We also need some structure to pass to the filters as
 they have to know from where to read, and where they should write.
So I introduce Frame struct:
{
imgfmt - the image format, the most important parameter
height, width - dimensions in pixel
stride - size of image line in bytes, it could be larger then width*sizeof(pixel), 
         it could be and negative (for vertical flip)
base0,base1,base2,base3 - pointers to planes, thay depend on imgfmt
baseq - quant storage plane. we may need to add qstride, or some qhight/qwidth
palette - pointer to table with palette colors.
flags read-only - this frame is read only.
//screen position ??
}


2.4 Negotiation
Few words about negotiation. It is hard thing to find the best mode. Here is
algorithm that could find the best mode. But first I must say that we need
some kind of weight for the filters and drawing. I think that we could use
something like megabytes/second, something that we may measure or benchmark.

  1. We choose codec
  2. We choose video driver.
  3. For each combination find the total weight and if there are any
  optional filters find min and max weight. Be careful max weight is not
  always at maximum filters!! (e.g. cropping)
  4. Compare the results.

I may say that we don't need automatic codec selection as now we could put
best codecs at beginning of codecs.conf as it is now. We may need to make
the same thing with videodrv.conf. Or better make config files with preferred 
order of decoders and video modes:)

I wait comments and ideas.