mpv/DOCS/tech/encoding-tips.txt


ENCODING QUALITY - OR WHY AUTOMATISM IS BAD.

Hi everyone.

Some days ago someone suggested adding some preset options to mencoder.
At that time I replied 'don't do that', and now I decided to elaborate
on that.

Warning: this is rather long, and it involves mathematics. But if you
don't want to bother with either then why are you encoding in the
first place? Go do something different!

The good news is: it's all about the bpp (bits per pixel).

The bad news is: it's not THAT easy ;)

This mail is about encoding a DVD to MPEG4. It's about the video
quality, not (primarily) about the audio quality or some other fancy
things like subtitles.

The first step is to encode the audio. Why? Well if you encode the
audio prior to the video you'll have to make the video fit onto one
(or two) CD(s). That way you can use oggenc's quality based encoding
mode which is much more sophisticated than its ABR based mode.

After encoding the audio you have a certain amount of space left to
fill with video. Let's assume the audio takes 60M (no problem with
Vorbis), and you aim at a 700M CD. This leaves you 640M for the video.
Let's further assume that the video is 100 minutes or 6000 seconds
long, encoded at 25fps (those nasty NTSC fps values give me
headaches. Adjust to your needs, of course!). This leaves you with
a video bitrate of:

                $videosize * 8 
$videobitrate = --------------
                $length * 1000

$videosize in bytes, $length in seconds, $videobitrate in kbit/s.
In my example I end up with $videobitrate = 895.

And now comes the question: how do I chose my encoding parameters
so that the results will be good? First let's take a look at a
typical mencoder line:

mencoder -dvd 1 -o /dev/null -oac copy -ovc lavc \
  -lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:\
  vlelim=-4:vcelim=9:lumi_mask=0.05:dark_mask=0.01:vpass=1 \
  -vop scale=640:480,crop=716:572:2:2

Phew, all those parameters! Which ones should I change? NEVER leave
out 'vhq'. Never ever. 'vqmin=2' is always good if you aim for sane
settings - like 'normal length' movies on one CD, 'very long movies'
on two CDs and so on. vcodec=mpeg4 is mandatory.

The 'vlelim=-4:vcelim=9:lumi_mask=0.05:dark_mask=0.01' are parameters
suggested by D Richard Felker for non-animated movies, and they
improve quality a bit.

But the two things that have the most influence on quality are
vbitate and scale. Why? Because both together tell the codec how
many bits it may spend on each frame for each bit: and this is
the 'bpp' value (bits per pixel). It's simply defined as

         $videobitrate * 1000       
$bpp = -----------------------
       $width * $height * $fps

I've attached a small Perl script that calculates the $bpp for
a movie. You'll have to give it four parameters:
a) the cropped but unscaled resolution (use '-vop cropdetect'),
b) the encoded aspect ratio. All DVDs come at 720x576 but contain
a flag that tells the player wether it should display the DVD at
an aspect ratio of 4/3 (1.333) or at 16/9 (1.777). Have a look
at mplayer's output - there's something about 'prescaling'. That's
what you are looking for.
c) the video bitrate in kbit/s and
d) the fps.

In my example the command line and calcbpp.pl's output would look
like this (warning - long lines ahead):

mosu@anakin:~$ ./calcbpp.pl 720x440 16/9 896 25
Prescaled picture: 1023x440, AR 2.33
720x304, diff   5, new AR 2.37, AR error 1.74% scale=720:304 bpp: 0.164
704x304, diff  -1, new AR 2.32, AR error 0.50% scale=704:304 bpp: 0.167
688x288, diff   8, new AR 2.39, AR error 2.58% scale=688:288 bpp: 0.181
672x288, diff   1, new AR 2.33, AR error 0.26% scale=672:288 bpp: 0.185
656x288, diff  -6, new AR 2.28, AR error 2.17% scale=656:288 bpp: 0.190
640x272, diff   3, new AR 2.35, AR error 1.09% scale=640:272 bpp: 0.206
624x272, diff  -4, new AR 2.29, AR error 1.45% scale=624:272 bpp: 0.211
608x256, diff   5, new AR 2.38, AR error 2.01% scale=608:256 bpp: 0.230
592x256, diff  -2, new AR 2.31, AR error 0.64% scale=592:256 bpp: 0.236
576x240, diff   8, new AR 2.40, AR error 3.03% scale=576:240 bpp: 0.259
560x240, diff   1, new AR 2.33, AR error 0.26% scale=560:240 bpp: 0.267
544x240, diff  -6, new AR 2.27, AR error 2.67% scale=544:240 bpp: 0.275
528x224, diff   3, new AR 2.36, AR error 1.27% scale=528:224 bpp: 0.303
512x224, diff  -4, new AR 2.29, AR error 1.82% scale=512:224 bpp: 0.312
496x208, diff   5, new AR 2.38, AR error 2.40% scale=496:208 bpp: 0.347
480x208, diff  -2, new AR 2.31, AR error 0.85% scale=480:208 bpp: 0.359
464x192, diff   7, new AR 2.42, AR error 3.70% scale=464:192 bpp: 0.402
448x192, diff   1, new AR 2.33, AR error 0.26% scale=448:192 bpp: 0.417
432x192, diff  -6, new AR 2.25, AR error 3.43% scale=432:192 bpp: 0.432
416x176, diff   3, new AR 2.36, AR error 1.54% scale=416:176 bpp: 0.490
400x176, diff  -4, new AR 2.27, AR error 2.40% scale=400:176 bpp: 0.509
384x160, diff   5, new AR 2.40, AR error 3.03% scale=384:160 bpp: 0.583
368x160, diff  -2, new AR 2.30, AR error 1.19% scale=368:160 bpp: 0.609
352x144, diff   7, new AR 2.44, AR error 4.79% scale=352:144 bpp: 0.707
336x144, diff   0, new AR 2.33, AR error 0.26% scale=336:144 bpp: 0.741
320x144, diff  -6, new AR 2.22, AR error 4.73% scale=320:144 bpp: 0.778

A word for the $bpp. For a fictional movie which is only black and
white: if you have a $bpp of 1 then the movie would be stored
uncompressed :) For a real life movie with 24bit color depth you
need compression of course. And the $bpp can be used to make the
decision easier.

As you can see the resolutions suggested by the script are all
dividable by 16. This will make the aspect ratio slightly wrong,
but no one will notice.

Now if you want to decide which resolution (and scaling parameters)
to chose you can do that by looking at the $bpp:

< 0.10: don't do it. Please. I beg you!
< 0.15: It will look bad.
< 0.20: You will notice blocks, but it will look ok.
< 0.25: It will look really good.
> 0.25: It won't really improve visually.
> 0.30: Don't do that either - try a bigger resolution instead.

Of course these values are not absolutes! For movies with really lots
of black areas 0.15 may look very good. Action movies with only high
motion scenes on the other hand may not look perfect at 0.25. But these
values give you a great idea about which resolution to chose.

I see a lot of people always using 512 for the width and scaling
the height accordingly. For my (real-world-)example this would be
simply a waste of bandwidth. The encoder would probably not even
need the full bitrate, and the resulting file would be smaller
than my targetted 700M.

After encoding you'll do your 'quality check'. First fire up the movie
and see whether it looks good to you or not. But you can also do a
more 'scientific' analysis. The second Perl script I attached counts
the quantizers used for the encoding. Simply call it with

countquant.pl < divx2pass.log

It will print out which quantizer was used how often. If you see that
e.g. the lowest quantizer (vqmin=2) gets used for > 95% of the frames
then you can safely increase your picture size.

> The "counting the quantesizer"-thing could improve the quality of
> full automated scripts, as I understand ?

Yes, the log file analysis can be used be tools to automatically adjust
the scaling parameters (if you'd do that you'd end up with a three-pass
encoding for the video only ;)), but it can also provide answers for
you as a human. From time to time there's a question like 'hey,
mencoder creates files that are too small! I specified this bitrate and
the resulting file is 50megs short of the target file size!'. The
reason is probably that the codec already uses the minimum quantizer
for nearly all frames so it simply does not need more bits. A quick
glance at the distribution of the quantizers can be enlightening.

Another thing is that q=2 and q=3 look really good while the 'bigger'
quantizers really lose quality. So if your distribution shows the
majority of quantizers at 4 and above then you should probably decrease
the resolution (you'll definitly see block artefacts).


Well... Several people will probably disagree with me on certain 
points here, especially when it comes down to hard values (like the
$bpp categories and the percentage of the quantizers used). But
the idea is still valid.

And that's why I think that there should NOT be presets in mencoder
like the presets lame knows. 'Good quality' or 'perfect quality' are
ALWAYS relative. They always depend on a person's personal preferences.
If you want good quality then spend some time reading and - more
important - understanding what steps are involved in video encoding.
You cannot do it without mathematics. Oh well, you can, but you'll
end up with movies that could certainly look better.

Now please shoot me if you have any complaints ;)

-- 
 ==> Ciao, Mosu (Moritz Bunkus)

===========
ANOTHER APPROACH: BITS PER BLOCK:

>          $videobitrate * 1000       
> $bpp = -----------------------
>        $width * $height * $fps

Well, I came to similar equation going through different route. Only I
didn't use bits per pixel, in my case it was bits per block (BPB). The block
is 16x16 because lots of software depends on video width/height being
divisable by 16. And because I didn't like this 0.2 bit per pixel, when
bit is quite atomic ;)

So the equation was something like:

                 bitrate
bpb =           -----------------
       fps * ((width * height) / (16 * 16))

(width and height are from destination video size, and bitrate is in
bits (i.e. 900kbps is 900000))

This way it apeared that the minimum bits per block is ~40, very
good results are with ~50, and everything above 60 is a waste of bandwith.
And what's actually funny is that it was independant of codec used. The
results were exactly the same, whether I used DIV3 (with tricky nandub's
magick), ffmpeg odivx, DivX5 on Windows or XviD.

Surprisingly there is one advantage of using nandub-DIV3 for bitrate
starved encoding: ringing almost never apears this way.

But I also found out, that the quality/BPB isn't constant for
drastically different resolutions. Smaller picture (like MPEG1 sizes)
need more BPB to look good than say typical MPEG2 resolutions.

Robert


===========
DON'T SCALE DOWN TOO MUCH

Sometimes I found that encoding to y-scaled only DVD qualty (ie 704 x
288 for a 2.85 film) gives better visual quality than a scaled-down
version even if the quantizers are significantly higher than for the
scaled-down version.
Keep in mind that blocs, fuzzy parts and generaly mpeg artefacts in a
704x288 image will be harder to spot in full-screen mode than on a
512x208 image. In fact I've see example where the same movie looks
better compressed to 704x288 with an average weighted quantizer of
~3 than the same movie scaled to 576x240 with an average weighted
quantizer of 2.4.
Btw, a print of the weighted average quantizer would be nice in
countquant.pl :)

Another point in favor of not trying to scale down too much : on hard
scaled-down movies, the MPEG codec will need to compress relatively
high frequencies rather than low frequencies and it doesn't like that
at all. You will see less and less returns while you scale down and
scale down again in desesperate need of some bandwidth :)

In my experience, don't try to go below a width of 576 without closely
watching what's going on.

-- 
R<EFBFBD>mi

===========
TIPS FOR ENCODING

That being  said, with  video you  have some tradeoffs  you can  make. Most
people  seem to  encode with  really basic  options, but  if you  play with
single coefficient elimination and luma masking settings, you can save lots
of bits,  resulting in  lower quantizers, which  means less  blockiness and
less ugly noise  (ringing) around sharp borders. The  tradeoff, however, is
that you'll  get some "muddiness" in  some parts of the  image. Play around
with the  settings and see  for yourself. The  options I typically  use for
(non-animated) movies are:

vlelim=-4
vcelim=9
lumi_mask=0.05
dark_mask=0.01

If things  look too muddy,  making the numbers closer  to 0. For  anime and
other animation, the above recommendations may not be so good.

Another option that may be useful is allowing four motion vectors per
macroblock (v4mv). This will increase encoding time quite a bit, and
last I checked it wasn't compatible with B frames. AFAIK, specifying
v4mv should never reduce quality, but it may prevent some old junky
versions of DivX from decoding it (can anyone conform?). Another issue
might be increased cpu time needed for decoding (again, can anyone
confirm?).

To get more fair distribution of bits between low-detail and
high-detail scenes, you should probably try increasing vqcomp from the
default (0.5) to something in the range 0.6-0.8.

Of course you also  want to make sure you crop ALL of  the black border and
any half-black  pixels at the  edge of the image,  and make sure  the final
image dimensions after cropping and scaling are multiples of 16. Failing to
do so will drastically reduce quality.

Finally, if  you can't seem  to get good results,  you can try  scaling the
movie down  a bit smaller  or applying a weak  gaussian blur to  reduce the
amount of detail.

Now, my personal success story! I  just recently managed to fit a beautiful
encode of  Kundun (well  over 2  hours long, but  not too  many high-motion
scenes) on  one cd at  640x304, with 66 kbit/sec  abr ogg audio,  using the
options I  described above. So, IMHO  it's definitely possible to  get very
good  results with  libavcodec (certainly  MUCH better  than all  the idiot
"release groups" using DivX3  make), as long as you take  some time to play
around with the options.


Rich

============
ABOUT VLELIM, VCELIM, LUMI_MASK AND DARK_MASK PART I: LUMA & CHROMA


The l/c in vlelim and vcelim  stands for luma (brightness plane) and chroma
(color planes). These  are encoded separately in  all mpeg-like algorithms.
Anyway, the idea behind these options  is (at least from what I understand)
to use some good heuristics to determine when the change in a block is less
than the  threshold you  specify, and in  such a case,  to just  encode the
block as "no change". This saves bits and perhaps speeds up encoding. Using
a negative value  for either one means the same  thing as the corresponding
positive value,  but the DC  coefficient is also  considered. Unfortunately
I'm not familiar  enough with the mpeg terminology to  know what this means
(my first guess would be that it's  the constant term from the DCT), but it
probably  makes  the  encoder  less  likely  to  apply  single  coefficient
elimination in cases  where it would look bad.  It's presumably recommended
to use negative values for luma  (which is more noticable) and positive for
chroma.

The other options  -- lumi_mask and dark_mask -- control  how the quantizer
is  adjusted for  really  dark or  bright regions  of  the picture.  You're
probably already  at least a  bit familiar  with the concept  of quantizers
(qscale, lower  = more precision, higher  quality, but more bits  needed to
encode). What  not everyone  seems to  know is that  the quantizer  you see
(e.g. in the 2pass logs) is just  an average for the whole frame, and lower
or higher quantizers may in fact be  used in parts of the picture with more
or less detail. Increasing the values of lumi_mask and dark_mask will cause
lavc to  aggressively increase the  quantizer in  very dark or  very bright
regions of the picture (which are  presumably not as noticable to the human
eye) in order to save bits for use elsewhere.

Rich

===================
ABOUT VLELIM, VCELIM, LUMI_MASK AND DARK_MASK PART II: VQSCALE

OK, a quick explanation. The quantizer you set with vqscale=N is the
per-frame quantizer parameter (aka qp). However, with mpeg4 it's
allowed (and recommended!) for the encoder to vary the quantizer on a
per-macroblock (mb) basis (as I understand it, macroblocks are 16x16
regions composed of 4 8x8 luma blocks and 2 8x8 chroma blocks, u and
v). To do this, lavc scores each mb with a complexity value and
weights the quantizer accordingly. However, you can control this
behavior somewhat with scplx_mask, tcplx_mask, dark_mask, and
lumi_mask.

scplx_mask -- raise quantizer on mb's with lots of spacial complexity.
Spacial complexity is measured by variance of the texture (this is
just the actual image for I blocks and the difference from the
previous coded frame for P blocks).

tcplx_mask -- raise quantizer on mb's with lots of temporal
complexity. Temporal complexity is measured according to motion
vectors.

dark_mask -- raise quantizer on very dark mb's.

lumi_mask -- raise quantizer on very bright mb's.
Somewhere around 0-0.15 is a safe range for these values, IMHO. You
might try as high as 0.25 or 0.3. You should probably never go over
0.5 or so.

Now, about naq. When you adjust the quantizers on a per-mb basis like
this (called adaptive quantization), you might decrease or (more
likely) increase the average quantizer used, so that it no longer
matches the requested average quantizer (qp) for the frame. This will
result in weird things happening with the bitrate, at least from my
experience. What naq does is "normalize adaptive quantization". That
is, after the above masking parameters are applied on a per-mb basis,
the quantizers of all the blocks are rescaled so that the average
stays fixed at the desired qp.

So, if I used vqscale=4 with naq and fairly large values for the
masking parameters, I might be likely to see lots of frames using
qscale 2,3,4,5,6,7 across different macroblocks as needed, but with
the average sticking around 4. However, I haven't actually tested such
a setup yet, so it's just speculation right now.

Have fun playing around with it.

Rich

======================
TIPS FOR ENCODING OLD BLACK & WHITE MOVIES:

I found myself that  4:3 B&W old movies are very hard  to compress well. In
addition to the 4:3 aspect ratio which  eats lots of bits, those movies are
typically very "noisy", which doesn't help at all. Anyway :

> After a few tries I am                                                        
> still a little bit disappointed with the video quality. Since it is a         
> "dark" movies, there is a lot of black on the pictures, and on the            
> encoded avi I can see a lot of annoying "mpeg squares". I am using            
> avifile codec, but the best I think is to give you the command line I         
> used to encode a preview of the result:                                       

>                                                                               
> First pass:                                                                   
> mencoder TITLE01-ANGLE1.VOB -oac copy -ovc lavc -lavcopts                     
> vcodec=mpeg4:vhq:vpass=1:vbitrate=800:keyint=48 -ofps 23.976 -npp lb          
> -ss 2:00 -endpos 0:30 -vop scale -zoom -xy 640 -o movie.avi                   

1) keyint=48 is way too low. The  default value is 250, this is in *frames*
not seconds. Key frames are significantly larger than P or B frames, so the
less key frames you have, better the overall movie will be. (huh, like Yoda
I  speak ;).  Try keyint=300  or  350. Don't  go  beyond that  if you  want
relatively precise seeking.

2) you may want to play with  vlelim and vcelim options. This can gives you
a significant "quality" boost. Try one of these couples :

vlelim=-2:vcelim=3
vlelim=-3:vcelim=5
vlelim=-4:vcelim=7
(and yes, there's a minus)

3) crop & rescale the movie before  passing it to the codec. First crop the
movie  to  not  encode black  bars  if  there's  any.  For a  1h40mn  movie
compressed to  a 700  MB file,  I would try  something between  512x384 and
480x320. Don't  go below that if  you want something relatively  sharp when
viewed fullscreen.

4)  I would  recommend  using the  Ogg  Vorbis audio  codec  with the  .ogm
container format. Ogg  Vorbis compress audio better than MP3.  On a typical
old,  mono-only audio  stream, a  45 kbits/s  Vorbis stream  is ok.  How to
extract  & compress  an audio  stream  from a  ripped DVD  (mplayer -dvd  1
-dumpstream) :

rm -f audiodump.pcm ; mkfifo -m 600 audiodump.pcm
mplayer -quiet -vc null -vo null -aid 128 -ao pcm -nowaveheader stream.dump &
oggenc --raw --raw-bits=16 --raw-chan=2 --raw-rate=48000 -q 1 -o audio-us.ogg
+audiodump.pcm &
wait

For a nice set of utilities to manager the .ogm format, see Moritz Bunkus'
ogmtools (google is your friend).

5) use  the "v4mv"  option. This  could gives you  a few  more bits  at the
expense of a slightly longer encoding.  This is a "lossless" option, I mean
with  this option  you don't  throw away  some video  information, it  just
selects a  more precise motion  estimation method.  Be warned that  on some
very  un-typical scenes  this  option  may gives  you  a  longer file  than
without, although it's very rare and on  a whole film I think it's always a
win.

6) you can try the new luminance & darkness masking code. Play
with the "lumi_mask" and "dark_mask" options. I would recommend using
something like :
lumi_mask=0.07:dark_mask=0.10:naq:
lumi_mask=0.10:dark_mask=0.12:naq:
lumi_mask=0.12:dark_mask=0.15:naq
lumi_mask=0.13:dark_mask=0.16:naq:
Be warned that these options are really experimental and the result
could be very good or very bad depending on your visualization device
(computer CRT, TV or TFT screen). Don't push too hard these options.

> Second pass:                                                                  
> the same with vpass=2                                                         

7) I've found  that lavc gives better  results when the first  pass is done
with  "vqscale=2" instead  of a  target bitrate.  The statistics  collected
seems to be more precise. YMMV.

> I am new to mencoder, so please tell me any idea you have even if it          
> obvious. I also tried the "gray" option of lavc, to encode B&W only,          
> but strangely it gives me "pink" squares from time to time.                   

Yes, I've seen  that too. Playing the resulting file  with "-lavdopts gray"
fix the problem but it's not very nice ...

> So if you could tell me what option of mencoder or lavc I should be           
> looking at to lower the number of "squares" on the image, it would be         
> great. The version of mencoder i use is 0.90pre8 on a macos x PPC             
> platform. I guess I would have the same problem by encoding anime             
> movies, where there are a lot of region of the image with the same            
> color. So if you managed to solve this problem...                             

You could  also try the  "mpeg_quant" flag. It  selects a different  set of
quantizers and produce  somewhat sharper pictures and less  blocks on large
zones with the same or similar luminance, at the expense of some bits.

> This is completely off topic, but do you know how I can create good           
> subtitles from vobsub subtitles ? I checked the -dumpmpsub option of          
> mplayer, but is there a way to do it really fast (ie without having to        
> play the whole movie) ?                                                       

I didn't  find a way under  *nix to produce reasonably  good text subtitles
from vobsubs. OCR  *nix softwares seems either not suited  to the task, not
powerful enough or both. I'm extracting the vobsub subtitles and simply use
them with the .ogm

/ .avi :
1) rip the DVD to harddisk with "mplayer -dvd 1 -dumpstream"
2) mount the DVD and copy the .ifo file
2) extract all vobsubs to one single file with something like :

for f in 0 1 2 3 4 5 6 7 8 9 10 11 ; do \
    mencoder -ovc copy -oac copy -o /dev/null -sid $f -vobsubout sous-titres
+-vobsuboutindex $f -ifo vts_01_0.ifo stream.dump
done

(and yes, I've a DVD with 12 subtitles)
--                                                                              
R<EFBFBD>mi

================================

TIPS FOR SMOKE & CLOUDS 

Q: I'm trying  to encode Dante's Peak and I'm  having problems with clouds,
fog and  smoke: They don't  look fine  (they look very  bad if I  watch the
movie  in TVout).  There are  some artifacts,  white clouds  looks as  snow
mountains, there are things likes hip in the colors so one can see frontier
curves between white and light gray and  dark gray ... (I don't know if you
can understand me, I want to mean that the colors don't change smoothly)
In particular I'm using vqscale=2:vhq:v4mv

A: Try adding "vqcomp=0.7:vqblur=0.2:mpeg_quant" to lavcopts.

Q: I tried your suggestion and it  improved the image a little ... but not 
enough. I was playing with different  options and I couldn't find the way. 
I  suppose that  the vob  is not  so good  (watching it  in TV  trough the 
computer looks better than my encoding, but it isn't a lot of better).     

A: Yes, those scenes with qscale=2 looks terrible :-(

Try with  vqmin=1 in addition to  mpeg_quant:vlelim=-4:vcelim=-7 (and maybe
with "-sws 10 -ssf ls=1" to sharpen a bit the image) and read about vqmin=1
in DOCS/tech/libavc-options.txt.

If after the whole movie is encoded you still see the same problem, it will
means that the  second pass didn't picked-up q=1 for  this scene. Force q=1
with the "vrc_override" option.

Q: By the way, is there a special difficult in encode clouds or smoke?

A: I would say it depends on the sharpness of these clouds / smokes and the
fact that  they are mostly black/white/grey  or colored. The codec  will do
the right thing with vqmin=2 for example on a cigarette smoke (sharp) or on
a red/yellow cloud (explosion, cloud of fire).  But may not with a grey and
very fuzzy cloud like in the chocolat scene. Note that I don't know exactly
why ;)

A = R<>mi