mpv/DOCS/tech/encoding-tips.txt

564 lines
26 KiB
Plaintext
Raw Blame History

Some important URLs:
~~~~~~~~~~~~~~~~~~~~
http://www.mplayerhq.hu/~michael/codec-features.html <- lavc vs. divx5 vs. xvid
http://rguyom.chez.tiscali.fr/libavcodec_tests.html <- lavc benchmarks, options
http://cutka.szm.sk/ffdshow/index.html <- lavc for win32 :)
http://www.bunkus.org/dvdripping4linux/index.html <- a nice tutorial
================================================================================
ENCODING QUALITY - OR WHY AUTOMATISM IS BAD.
Hi everyone.
Some days ago someone suggested adding some preset options to mencoder.
At that time I replied 'don't do that', and now I decided to elaborate
on that.
Warning: this is rather long, and it involves mathematics. But if you
don't want to bother with either then why are you encoding in the
first place? Go do something different!
The good news is: it's all about the bpp (bits per pixel).
The bad news is: it's not THAT easy ;)
This mail is about encoding a DVD to MPEG4. It's about the video
quality, not (primarily) about the audio quality or some other fancy
things like subtitles.
The first step is to encode the audio. Why? Well if you encode the
audio prior to the video you'll have to make the video fit onto one
(or two) CD(s). That way you can use oggenc's quality based encoding
mode which is much more sophisticated than its ABR based mode.
After encoding the audio you have a certain amount of space left to
fill with video. Let's assume the audio takes 60M (no problem with
Vorbis), and you aim at a 700M CD. This leaves you 640M for the video.
Let's further assume that the video is 100 minutes or 6000 seconds
long, encoded at 25fps (those nasty NTSC fps values give me
headaches. Adjust to your needs, of course!). This leaves you with
a video bitrate of:
$videosize * 8
$videobitrate = --------------
$length * 1000
$videosize in bytes, $length in seconds, $videobitrate in kbit/s.
In my example I end up with $videobitrate = 895.
And now comes the question: how do I chose my encoding parameters
so that the results will be good? First let's take a look at a
typical mencoder line:
mencoder -dvd 1 -o /dev/null -oac copy -ovc lavc \
-lavcopts vcodec=mpeg4:vbitrate=1000:vhq:vqmin=2:\
vlelim=-4:vcelim=9:lumi_mask=0.05:dark_mask=0.01:vpass=1 \
-vf crop=716:572:2:2,scale=640:480
Phew, all those parameters! Which ones should I change? NEVER leave
out 'vhq'. Never ever. 'vqmin=2' is always good if you aim for sane
settings - like 'normal length' movies on one CD, 'very long movies'
on two CDs and so on. vcodec=mpeg4 is mandatory.
The 'vlelim=-4:vcelim=9:lumi_mask=0.05:dark_mask=0.01' are parameters
suggested by D Richard Felker for non-animated movies, and they
improve quality a bit.
But the two things that have the most influence on quality are
vbitate and scale. Why? Because both together tell the codec how
many bits it may spend on each frame for each bit: and this is
the 'bpp' value (bits per pixel). It's simply defined as
$videobitrate * 1000
$bpp = -----------------------
$width * $height * $fps
I've attached a small Perl script that calculates the $bpp for
a movie. You'll have to give it four parameters:
a) the cropped but unscaled resolution (use '-vf cropdetect'),
b) the encoded aspect ratio. All DVDs come at 720x576 but contain
a flag that tells the player wether it should display the DVD at
an aspect ratio of 4/3 (1.333) or at 16/9 (1.777). Have a look
at mplayer's output - there's something about 'prescaling'. That's
what you are looking for.
c) the video bitrate in kbit/s and
d) the fps.
In my example the command line and calcbpp.pl's output would look
like this (warning - long lines ahead):
mosu@anakin:~$ ./calcbpp.pl 720x440 16/9 896 25
Prescaled picture: 1023x440, AR 2.33
720x304, diff 5, new AR 2.37, AR error 1.74% scale=720:304 bpp: 0.164
704x304, diff -1, new AR 2.32, AR error 0.50% scale=704:304 bpp: 0.167
688x288, diff 8, new AR 2.39, AR error 2.58% scale=688:288 bpp: 0.181
672x288, diff 1, new AR 2.33, AR error 0.26% scale=672:288 bpp: 0.185
656x288, diff -6, new AR 2.28, AR error 2.17% scale=656:288 bpp: 0.190
640x272, diff 3, new AR 2.35, AR error 1.09% scale=640:272 bpp: 0.206
624x272, diff -4, new AR 2.29, AR error 1.45% scale=624:272 bpp: 0.211
608x256, diff 5, new AR 2.38, AR error 2.01% scale=608:256 bpp: 0.230
592x256, diff -2, new AR 2.31, AR error 0.64% scale=592:256 bpp: 0.236
576x240, diff 8, new AR 2.40, AR error 3.03% scale=576:240 bpp: 0.259
560x240, diff 1, new AR 2.33, AR error 0.26% scale=560:240 bpp: 0.267
544x240, diff -6, new AR 2.27, AR error 2.67% scale=544:240 bpp: 0.275
528x224, diff 3, new AR 2.36, AR error 1.27% scale=528:224 bpp: 0.303
512x224, diff -4, new AR 2.29, AR error 1.82% scale=512:224 bpp: 0.312
496x208, diff 5, new AR 2.38, AR error 2.40% scale=496:208 bpp: 0.347
480x208, diff -2, new AR 2.31, AR error 0.85% scale=480:208 bpp: 0.359
464x192, diff 7, new AR 2.42, AR error 3.70% scale=464:192 bpp: 0.402
448x192, diff 1, new AR 2.33, AR error 0.26% scale=448:192 bpp: 0.417
432x192, diff -6, new AR 2.25, AR error 3.43% scale=432:192 bpp: 0.432
416x176, diff 3, new AR 2.36, AR error 1.54% scale=416:176 bpp: 0.490
400x176, diff -4, new AR 2.27, AR error 2.40% scale=400:176 bpp: 0.509
384x160, diff 5, new AR 2.40, AR error 3.03% scale=384:160 bpp: 0.583
368x160, diff -2, new AR 2.30, AR error 1.19% scale=368:160 bpp: 0.609
352x144, diff 7, new AR 2.44, AR error 4.79% scale=352:144 bpp: 0.707
336x144, diff 0, new AR 2.33, AR error 0.26% scale=336:144 bpp: 0.741
320x144, diff -6, new AR 2.22, AR error 4.73% scale=320:144 bpp: 0.778
A word for the $bpp. For a fictional movie which is only black and
white: if you have a $bpp of 1 then the movie would be stored
uncompressed :) For a real life movie with 24bit color depth you
need compression of course. And the $bpp can be used to make the
decision easier.
As you can see the resolutions suggested by the script are all
dividable by 16. This will make the aspect ratio slightly wrong,
but no one will notice.
Now if you want to decide which resolution (and scaling parameters)
to chose you can do that by looking at the $bpp:
< 0.10: don't do it. Please. I beg you!
< 0.15: It will look bad.
< 0.20: You will notice blocks, but it will look ok.
< 0.25: It will look really good.
> 0.25: It won't really improve visually.
> 0.30: Don't do that either - try a bigger resolution instead.
Of course these values are not absolutes! For movies with really lots
of black areas 0.15 may look very good. Action movies with only high
motion scenes on the other hand may not look perfect at 0.25. But these
values give you a great idea about which resolution to chose.
I see a lot of people always using 512 for the width and scaling
the height accordingly. For my (real-world-)example this would be
simply a waste of bandwidth. The encoder would probably not even
need the full bitrate, and the resulting file would be smaller
than my targetted 700M.
After encoding you'll do your 'quality check'. First fire up the movie
and see whether it looks good to you or not. But you can also do a
more 'scientific' analysis. The second Perl script I attached counts
the quantizers used for the encoding. Simply call it with
countquant.pl < divx2pass.log
It will print out which quantizer was used how often. If you see that
e.g. the lowest quantizer (vqmin=2) gets used for > 95% of the frames
then you can safely increase your picture size.
> The "counting the quantesizer"-thing could improve the quality of
> full automated scripts, as I understand ?
Yes, the log file analysis can be used be tools to automatically adjust
the scaling parameters (if you'd do that you'd end up with a three-pass
encoding for the video only ;)), but it can also provide answers for
you as a human. From time to time there's a question like 'hey,
mencoder creates files that are too small! I specified this bitrate and
the resulting file is 50megs short of the target file size!'. The
reason is probably that the codec already uses the minimum quantizer
for nearly all frames so it simply does not need more bits. A quick
glance at the distribution of the quantizers can be enlightening.
Another thing is that q=2 and q=3 look really good while the 'bigger'
quantizers really lose quality. So if your distribution shows the
majority of quantizers at 4 and above then you should probably decrease
the resolution (you'll definitly see block artefacts).
Well... Several people will probably disagree with me on certain
points here, especially when it comes down to hard values (like the
$bpp categories and the percentage of the quantizers used). But
the idea is still valid.
And that's why I think that there should NOT be presets in mencoder
like the presets lame knows. 'Good quality' or 'perfect quality' are
ALWAYS relative. They always depend on a person's personal preferences.
If you want good quality then spend some time reading and - more
important - understanding what steps are involved in video encoding.
You cannot do it without mathematics. Oh well, you can, but you'll
end up with movies that could certainly look better.
Now please shoot me if you have any complaints ;)
--
==> Ciao, Mosu (Moritz Bunkus)
===========
ANOTHER APPROACH: BITS PER BLOCK:
> $videobitrate * 1000
> $bpp = -----------------------
> $width * $height * $fps
Well, I came to similar equation going through different route. Only I
didn't use bits per pixel, in my case it was bits per block (BPB). The block
is 16x16 because lots of software depends on video width/height being
divisable by 16. And because I didn't like this 0.2 bit per pixel, when
bit is quite atomic ;)
So the equation was something like:
bitrate
bpb = -----------------
fps * ((width * height) / (16 * 16))
(width and height are from destination video size, and bitrate is in
bits (i.e. 900kbps is 900000))
This way it apeared that the minimum bits per block is ~40, very
good results are with ~50, and everything above 60 is a waste of bandwith.
And what's actually funny is that it was independant of codec used. The
results were exactly the same, whether I used DIV3 (with tricky nandub's
magick), ffmpeg odivx, DivX5 on Windows or XviD.
Surprisingly there is one advantage of using nandub-DIV3 for bitrate
starved encoding: ringing almost never apears this way.
But I also found out, that the quality/BPB isn't constant for
drastically different resolutions. Smaller picture (like MPEG1 sizes)
need more BPB to look good than say typical MPEG2 resolutions.
Robert
===========
DON'T SCALE DOWN TOO MUCH
Sometimes I found that encoding to y-scaled only DVD qualty (ie 704 x
288 for a 2.85 film) gives better visual quality than a scaled-down
version even if the quantizers are significantly higher than for the
scaled-down version.
Keep in mind that blocs, fuzzy parts and generaly mpeg artefacts in a
704x288 image will be harder to spot in full-screen mode than on a
512x208 image. In fact I've see example where the same movie looks
better compressed to 704x288 with an average weighted quantizer of
~3 than the same movie scaled to 576x240 with an average weighted
quantizer of 2.4.
Btw, a print of the weighted average quantizer would be nice in
countquant.pl :)
Another point in favor of not trying to scale down too much : on hard
scaled-down movies, the MPEG codec will need to compress relatively
high frequencies rather than low frequencies and it doesn't like that
at all. You will see less and less returns while you scale down and
scale down again in desesperate need of some bandwidth :)
In my experience, don't try to go below a width of 576 without closely
watching what's going on.
--
R<EFBFBD>mi
===========
TIPS FOR ENCODING
That being said, with video you have some tradeoffs you can make. Most
people seem to encode with really basic options, but if you play with
single coefficient elimination and luma masking settings, you can save lots
of bits, resulting in lower quantizers, which means less blockiness and
less ugly noise (ringing) around sharp borders. The tradeoff, however, is
that you'll get some "muddiness" in some parts of the image. Play around
with the settings and see for yourself. The options I typically use for
(non-animated) movies are:
vlelim=-4
vcelim=9
lumi_mask=0.05
dark_mask=0.01
If things look too muddy, making the numbers closer to 0. For anime and
other animation, the above recommendations may not be so good.
Another option that may be useful is allowing four motion vectors per
macroblock (v4mv). This will increase encoding time quite a bit, and
last I checked it wasn't compatible with B frames. AFAIK, specifying
v4mv should never reduce quality, but it may prevent some old junky
versions of DivX from decoding it (can anyone conform?). Another issue
might be increased cpu time needed for decoding (again, can anyone
confirm?).
To get more fair distribution of bits between low-detail and
high-detail scenes, you should probably try increasing vqcomp from the
default (0.5) to something in the range 0.6-0.8.
Of course you also want to make sure you crop ALL of the black border and
any half-black pixels at the edge of the image, and make sure the final
image dimensions after cropping and scaling are multiples of 16. Failing to
do so will drastically reduce quality.
Finally, if you can't seem to get good results, you can try scaling the
movie down a bit smaller or applying a weak gaussian blur to reduce the
amount of detail.
Now, my personal success story! I just recently managed to fit a beautiful
encode of Kundun (well over 2 hours long, but not too many high-motion
scenes) on one cd at 640x304, with 66 kbit/sec abr ogg audio, using the
options I described above. So, IMHO it's definitely possible to get very
good results with libavcodec (certainly MUCH better than all the idiot
"release groups" using DivX3 make), as long as you take some time to play
around with the options.
Rich
============
ABOUT VLELIM, VCELIM, LUMI_MASK AND DARK_MASK PART I: LUMA & CHROMA
The l/c in vlelim and vcelim stands for luma (brightness plane) and chroma
(color planes). These are encoded separately in all mpeg-like algorithms.
Anyway, the idea behind these options is (at least from what I understand)
to use some good heuristics to determine when the change in a block is less
than the threshold you specify, and in such a case, to just encode the
block as "no change". This saves bits and perhaps speeds up encoding. Using
a negative value for either one means the same thing as the corresponding
positive value, but the DC coefficient is also considered. Unfortunately
I'm not familiar enough with the mpeg terminology to know what this means
(my first guess would be that it's the constant term from the DCT), but it
probably makes the encoder less likely to apply single coefficient
elimination in cases where it would look bad. It's presumably recommended
to use negative values for luma (which is more noticable) and positive for
chroma.
The other options -- lumi_mask and dark_mask -- control how the quantizer
is adjusted for really dark or bright regions of the picture. You're
probably already at least a bit familiar with the concept of quantizers
(qscale, lower = more precision, higher quality, but more bits needed to
encode). What not everyone seems to know is that the quantizer you see
(e.g. in the 2pass logs) is just an average for the whole frame, and lower
or higher quantizers may in fact be used in parts of the picture with more
or less detail. Increasing the values of lumi_mask and dark_mask will cause
lavc to aggressively increase the quantizer in very dark or very bright
regions of the picture (which are presumably not as noticable to the human
eye) in order to save bits for use elsewhere.
Rich
===================
ABOUT VLELIM, VCELIM, LUMI_MASK AND DARK_MASK PART II: VQSCALE
OK, a quick explanation. The quantizer you set with vqscale=N is the
per-frame quantizer parameter (aka qp). However, with mpeg4 it's
allowed (and recommended!) for the encoder to vary the quantizer on a
per-macroblock (mb) basis (as I understand it, macroblocks are 16x16
regions composed of 4 8x8 luma blocks and 2 8x8 chroma blocks, u and
v). To do this, lavc scores each mb with a complexity value and
weights the quantizer accordingly. However, you can control this
behavior somewhat with scplx_mask, tcplx_mask, dark_mask, and
lumi_mask.
scplx_mask -- raise quantizer on mb's with lots of spacial complexity.
Spacial complexity is measured by variance of the texture (this is
just the actual image for I blocks and the difference from the
previous coded frame for P blocks).
tcplx_mask -- raise quantizer on mb's with lots of temporal
complexity. Temporal complexity is measured according to motion
vectors.
dark_mask -- raise quantizer on very dark mb's.
lumi_mask -- raise quantizer on very bright mb's.
Somewhere around 0-0.15 is a safe range for these values, IMHO. You
might try as high as 0.25 or 0.3. You should probably never go over
0.5 or so.
Now, about naq. When you adjust the quantizers on a per-mb basis like
this (called adaptive quantization), you might decrease or (more
likely) increase the average quantizer used, so that it no longer
matches the requested average quantizer (qp) for the frame. This will
result in weird things happening with the bitrate, at least from my
experience. What naq does is "normalize adaptive quantization". That
is, after the above masking parameters are applied on a per-mb basis,
the quantizers of all the blocks are rescaled so that the average
stays fixed at the desired qp.
So, if I used vqscale=4 with naq and fairly large values for the
masking parameters, I might be likely to see lots of frames using
qscale 2,3,4,5,6,7 across different macroblocks as needed, but with
the average sticking around 4. However, I haven't actually tested such
a setup yet, so it's just speculation right now.
Have fun playing around with it.
Rich
======================
TIPS FOR ENCODING OLD BLACK & WHITE MOVIES:
I found myself that 4:3 B&W old movies are very hard to compress well. In
addition to the 4:3 aspect ratio which eats lots of bits, those movies are
typically very "noisy", which doesn't help at all. Anyway :
> After a few tries I am
> still a little bit disappointed with the video quality. Since it is a
> "dark" movies, there is a lot of black on the pictures, and on the
> encoded avi I can see a lot of annoying "mpeg squares". I am using
> avifile codec, but the best I think is to give you the command line I
> used to encode a preview of the result:
>
> First pass:
> mencoder TITLE01-ANGLE1.VOB -oac copy -ovc lavc -lavcopts
> vcodec=mpeg4:vhq:vpass=1:vbitrate=800:keyint=48 -ofps 23.976 -npp lb
> -ss 2:00 -endpos 0:30 -vf scale -zoom -xy 640 -o movie.avi
1) keyint=48 is way too low. The default value is 250, this is in *frames*
not seconds. Key frames are significantly larger than P or B frames, so the
less key frames you have, better the overall movie will be. (huh, like Yoda
I speak ;). Try keyint=300 or 350. Don't go beyond that if you want
relatively precise seeking.
2) you may want to play with vlelim and vcelim options. This can gives you
a significant "quality" boost. Try one of these couples :
vlelim=-2:vcelim=3
vlelim=-3:vcelim=5
vlelim=-4:vcelim=7
(and yes, there's a minus)
3) crop & rescale the movie before passing it to the codec. First crop the
movie to not encode black bars if there's any. For a 1h40mn movie
compressed to a 700 MB file, I would try something between 512x384 and
480x320. Don't go below that if you want something relatively sharp when
viewed fullscreen.
4) I would recommend using the Ogg Vorbis audio codec with the .ogm
container format. Ogg Vorbis compress audio better than MP3. On a typical
old, mono-only audio stream, a 45 kbits/s Vorbis stream is ok. How to
extract & compress an audio stream from a ripped DVD (mplayer -dvd 1
-dumpstream) :
rm -f audiodump.pcm ; mkfifo -m 600 audiodump.pcm
mplayer -quiet -vc null -vo null -aid 128 -ao pcm -nowaveheader stream.dump &
oggenc --raw --raw-bits=16 --raw-chan=2 --raw-rate=48000 -q 1 -o audio-us.ogg
+audiodump.pcm &
wait
For a nice set of utilities to manager the .ogm format, see Moritz Bunkus'
ogmtools (google is your friend).
5) use the "v4mv" option. This could gives you a few more bits at the
expense of a slightly longer encoding. This is a "lossless" option, I mean
with this option you don't throw away some video information, it just
selects a more precise motion estimation method. Be warned that on some
very un-typical scenes this option may gives you a longer file than
without, although it's very rare and on a whole film I think it's always a
win.
6) you can try the new luminance & darkness masking code. Play
with the "lumi_mask" and "dark_mask" options. I would recommend using
something like :
lumi_mask=0.07:dark_mask=0.10:naq:
lumi_mask=0.10:dark_mask=0.12:naq:
lumi_mask=0.12:dark_mask=0.15:naq
lumi_mask=0.13:dark_mask=0.16:naq:
Be warned that these options are really experimental and the result
could be very good or very bad depending on your visualization device
(computer CRT, TV or TFT screen). Don't push too hard these options.
> Second pass:
> the same with vpass=2
7) I've found that lavc gives better results when the first pass is done
with "vqscale=2" instead of a target bitrate. The statistics collected
seems to be more precise. YMMV.
> I am new to mencoder, so please tell me any idea you have even if it
> obvious. I also tried the "gray" option of lavc, to encode B&W only,
> but strangely it gives me "pink" squares from time to time.
Yes, I've seen that too. Playing the resulting file with "-lavdopts gray"
fix the problem but it's not very nice ...
> So if you could tell me what option of mencoder or lavc I should be
> looking at to lower the number of "squares" on the image, it would be
> great. The version of mencoder i use is 0.90pre8 on a macos x PPC
> platform. I guess I would have the same problem by encoding anime
> movies, where there are a lot of region of the image with the same
> color. So if you managed to solve this problem...
You could also try the "mpeg_quant" flag. It selects a different set of
quantizers and produce somewhat sharper pictures and less blocks on large
zones with the same or similar luminance, at the expense of some bits.
> This is completely off topic, but do you know how I can create good
> subtitles from vobsub subtitles ? I checked the -dumpmpsub option of
> mplayer, but is there a way to do it really fast (ie without having to
> play the whole movie) ?
I didn't find a way under *nix to produce reasonably good text subtitles
from vobsubs. OCR *nix softwares seems either not suited to the task, not
powerful enough or both. I'm extracting the vobsub subtitles and simply use
them with the .ogm
/ .avi :
1) rip the DVD to harddisk with "mplayer -dvd 1 -dumpstream"
2) mount the DVD and copy the .ifo file
2) extract all vobsubs to one single file with something like :
for f in 0 1 2 3 4 5 6 7 8 9 10 11 ; do \
mencoder -ovc copy -oac copy -o /dev/null -sid $f -vobsubout sous-titres
+-vobsuboutindex $f -ifo vts_01_0.ifo stream.dump
done
(and yes, I've a DVD with 12 subtitles)
--
R<EFBFBD>mi
================================
TIPS FOR SMOKE & CLOUDS
Q: I'm trying to encode Dante's Peak and I'm having problems with clouds,
fog and smoke: They don't look fine (they look very bad if I watch the
movie in TVout). There are some artifacts, white clouds looks as snow
mountains, there are things likes hip in the colors so one can see frontier
curves between white and light gray and dark gray ... (I don't know if you
can understand me, I want to mean that the colors don't change smoothly)
In particular I'm using vqscale=2:vhq:v4mv
A: Try adding "vqcomp=0.7:vqblur=0.2:mpeg_quant" to lavcopts.
Q: I tried your suggestion and it improved the image a little ... but not
enough. I was playing with different options and I couldn't find the way.
I suppose that the vob is not so good (watching it in TV trough the
computer looks better than my encoding, but it isn't a lot of better).
A: Yes, those scenes with qscale=2 looks terrible :-(
Try with vqmin=1 in addition to mpeg_quant:vlelim=-4:vcelim=-7 (and maybe
with "-sws 10 -ssf ls=1" to sharpen a bit the image) and read about vqmin=1
in DOCS/tech/libavc-options.txt.
If after the whole movie is encoded you still see the same problem, it will
means that the second pass didn't picked-up q=1 for this scene. Force q=1
with the "vrc_override" option.
Q: By the way, is there a special difficult in encode clouds or smoke?
A: I would say it depends on the sharpness of these clouds / smokes and the
fact that they are mostly black/white/grey or colored. The codec will do
the right thing with vqmin=2 for example on a cigarette smoke (sharp) or on
a red/yellow cloud (explosion, cloud of fire). But may not with a grey and
very fuzzy cloud like in the chocolat scene. Note that I don't know exactly
why ;)
A = R<>mi