Encoding with MEncoderMaking a high quality MPEG-4 ("DivX") rip of a DVD movie
One frequently asked question is "How do I make the highest quality rip for
a given size?". Another question is "How do I make the highest quality DVD
rip possible? I do not care about file size, I just want the best quality."
The latter question is perhaps at least somewhat wrongly posed. After all, if
you do not care about file size, why not simply copy the entire MPEG-2 video
stream from the the DVD? Sure, your AVI will end up being 5GB, give
or take, but if you want the best quality and do not care about size,
this is certainly your best option.
In fact, the reason you want to transcode a DVD into MPEG-4 is
specifically because you do care about
file size.
It is difficult to offer a cookbook recipe on how to create a very high
quality DVD rip. There are several factors to consider, and you should
understand these details or else you are likely to end up disappointed
with your results. Below we will investigate some of these issues, and
then have a look at an example. We assume you are using
libavcodec to encode the video,
although the theory applies to other codecs as well.
If this seems to be too much for you, you should probably use one of the
many fine frontends that are listed in the
MEncoder section
of our related projects page.
That way, you should be able to achieve high quality rips without too much
thinking, because most of those tools are designed to take clever decisions
for you.
Preparing to encode: Identifying source material and framerate
Before you even think about encoding a movie, you need to take
several preliminary steps.
The first and most important step before you encode should be
determining what type of content you are dealing with.
If your source material comes from DVD or broadcast/cable/satellite
TV, it will be stored in one of two formats: NTSC for North
America and Japan, PAL for Europe, etc.
It is important to realize, however, that this is just the formatting for
presentation on a television, and often does
not correspond to the
original format of the movie.
Experience shows that NTSC material is a lot more difficult to encode,
because there more elements to identify in the source.
In order to produce a suitable encode, you need to know the original
format.
Failure to take this into account will result in various flaws in your
encode, including ugly combing (interlacing) artifacts and duplicated
or even lost frames.
Besides being ugly, the artifacts also harm coding efficiency:
You will get worse quality per unit bitrate.
Identifying source framerate
Here is a list of common types of source material, where you are
likely to find them, and their properties:
Standard Film: Produced for
theatrical display at 24fps.
PAL video: Recorded with a PAL
video camera at 50 fields per second.
A field consists of just the odd- or even-numbered lines of a
frame.
Television was designed to refresh these in alternation as a
cheap form of analog compression.
The human eye supposedly compensates for this, but once you
understand interlacing you will learn to see it on TV too and
never enjoy TV again.
Two fields do not make a
complete frame, because they are captured 1/50 of a second apart
in time, and thus they do not line up unless there is no motion.
NTSC Video: Recorded with an
NTSC video camera at 60000/1001 fields per second, or 60 fields per
second in the pre-color era.
Otherwise similar to PAL.
Animation: Usually drawn at
24fps, but also comes in mixed-framerate varieties.
Computer Graphics (CG): Can be
any framerate, but some are more common than others; 24 and
30 frames per second are typical for NTSC, and 25fps is typical
for PAL.
Old Film: Various lower
framerates.
Identifying source material
Movies consisting of frames are referred to as progressive,
while those consisting of independent fields are called
either interlaced or video - though this latter term is
ambiguous.
To further complicate matters, some movies will be a mix of
several of the above.
The most important distinction to make between all of these
formats is that some are frame-based, while others are
field-based.
Whenever a movie is prepared
for display on television (including DVD), it is converted to a
field-based format.
The various methods by which this can be done are collectively
referred to as "pulldown", of which the infamous NTSC
"3:2 telecine" is one variety.
Unless the original material was also field-based (and the same
fieldrate), you are getting the movie in a format other than the
original.
There are several common types of pulldown:PAL 2:2 pulldown: The nicest of
them all.
Each frame is shown for the duration of two fields, by extracting the
even and odd lines and showing them in alternation.
If the original material is 24fps, this process speeds up the
movie by 4%.
PAL 2:2:2:2:2:2:2:2:2:2:2:3 pulldown:
Every 12th frame is shown for the duration of three fields, instead of
just two.
This avoids the 4% speedup issue, but makes the process much
more difficult to reverse.
It is usually seen in musical productions where adjusting the
speed by 4% would seriously damage the musical score.
NTSC 3:2 telecine: Frames are
shown alternately for the duration of 3 fields or 2 fields.
This gives a fieldrate 2.5 times the original framerate.
The result is also slowed down very slightly from 60 fields per
second to 60000/1001 fields per second to maintain NTSC fieldrate.
NTSC 2:2 pulldown: Used for
showing 30fps material on NTSC.
Nice, just like 2:2 PAL pulldown.
There are also methods for converting between NTSC and PAL video,
but such topics are beyond the scope of this guide.
If you encounter such a movie and want to encode it, your best
bet is to find a copy in the original format.
Conversion between these two formats is highly destructive and
cannot be reversed cleanly, so your encode will greatly suffer
if it is made from a converted source.
When video is stored on DVD, consecutive pairs of fields are
grouped as a frame, even though they are not intended to be shown
at the same moment in time.
The MPEG-2 standard used on DVD and digital TV provides a
way both to encode the original progressive frames and to store
the number of fields for which a frame should be shown in the
header of that frame.
If this method has been used, the movie will often be described
as "soft-telecined", since the process only directs the
DVD player to apply pulldown to the movie rather than altering
the movie itself.
This case is highly preferable since it can easily be reversed
(actually ignored) by the encoder, and since it preserves maximal
quality.
However, many DVD and broadcast production studios do not use
proper encoding techniques but instead produce movies with
"hard telecine", where fields are actually duplicated in the
encoded MPEG-2.
The procedures for dealing with these cases will be covered
later in this guide.
For now, we leave you with some guides to identifying which type
of material you are dealing with:
NTSC regions:
If MPlayer prints that the framerate
has changed to 24000/1001 when watching your movie, and never changes
back, it is almost certainly progressive content that has been
"soft telecined".
If MPlayer shows the framerate
switching back and forth between 24000/1001 and 30000/1001, and you see
"combing" at times, then there are several possibilities.
The 24000/1001 fps segments are almost certainly progressive
content, "soft telecined", but the 30000/1001 fps parts could be
either hard-telecined 24000/1001 fps content or 60000/1001 fields per second NTSC video.
Use the same guidelines as the following two cases to determine
which.
If MPlayer never shows the framerate
changing, and every single frame with motion appears combed, your
movie is NTSC video at 60000/1001 fields per second.
If MPlayer never shows the framerate
changing, and two frames out of every five appear combed, your
movie is "hard telecined" 24000/1001fps content.
PAL regions:
If you never see any combing, your movie is 2:2 pulldown.
If you see combing alternating in and out every half second,
then your movie is 2:2:2:2:2:2:2:2:2:2:2:3 pulldown.
If you always see combing during motion, then your movie is PAL
video at 50 fields per second.
Hint:MPlayer can slow down movie playback
with the -speed option or play it frame-by-frame.
Try using 0.2 to watch the movie very
slowly or press the "." key repeatedly to play one frame at a time
and identify the pattern, if you cannot see it at full speed.
Constant quantizer vs. multipass
It is possible to encode your movie at a wide range of qualities.
With modern video encoders and a bit of pre-codec compression
(downscaling and denoising), it is possible to achieve very good
quality at 700 MB, for a 90-110 minute widescreen movie.
Furthermore, all but the longest movies can be encoded with near-perfect
quality at 1400 MB.
There are three approaches to encoding the video: constant bitrate
(CBR), constant quantizer, and multipass (ABR, or average bitrate).
The complexity of the frames of a movie, and thus the number of bits
required to compress them, can vary greatly from one scene to another.
Modern video encoders can adjust to these needs as they go and vary
the bitrate.
In simple modes such as CBR, however, the encoders do not know the
bitrate needs of future scenes and so cannot exceed the requested
average bitrate for long stretches of time.
More advanced modes, such as multipass encode, can take into account
the statistics from previous passes; this fixes the problem mentioned
above.
Note:
Most codecs which support ABR encode only support two pass encode
while some others such as x264,
XviD
and libavcodec support
multipass, which slightly improves quality at each pass,
yet this improvement is no longer measurable nor noticeable after the
4th or so pass.
Therefore, in this section, two pass and multipass will be used
interchangeably.
In each of these modes, the video codec (such as
libavcodec)
breaks the video frame into 16x16 pixel macroblocks and then applies a
quantizer to each macroblock. The lower the quantizer, the better the
quality and higher the bitrate.
The method the movie encoder uses to determine
which quantizer to use for a given macroblock varies and is highly
tunable. (This is an extreme over-simplification of the actual
process, but the basic concept is useful to understand.)
When you specify a constant bitrate, the video codec will encode the video,
discarding
detail as much as necessary and as little as possible in order to remain
lower than the given bitrate. If you truly do not care about file size,
you could as well use CBR and specify a bitrate of infinity. (In
practice, this means a value high enough so that it poses no limit, like
10000Kbit.) With no real restriction on bitrate, the result is that
the codec will use the lowest
possible quantizer for each macroblock (as specified by
for
libavcodec, which is 2 by default).
As soon as you specify a
low enough bitrate that the codec
is forced to use a higher quantizer, then you are almost certainly ruining
the quality of your video.
In order to avoid that, you should probably downscale your video, according
to the method described later on in this guide.
In general, you should avoid CBR altogether if you care about quality.
With constant quantizer, the codec uses the same quantizer, as
specified by the option (for
libavcodec), on every macroblock.
If you want the highest quality rip possible, again ignoring bitrate,
you can use .
This will yield the same bitrate and PSNR (peak signal-to-noise ratio)
as CBR with
=infinity and the default
of 2.
The problem with constant quantizing is that it uses the given quantizer
whether the macroblock needs it or not. That is, it might be possible
to use a higher quantizer on a macroblock without sacrificing visual
quality. Why waste the bits on an unnecessarily low quantizer? Your
CPU has as many cycles as there is time, but there is only so many bits
on your hard disk.
With a two pass encode, the first pass will rip the movie as though it
were CBR, but it will keep a log of properties for each frame. This
data is then used during the second pass in order to make intelligent
decisions about which quantizers to use. During fast action or low
detail scenes, higher quantizers will likely be used, and during
slow moving or high detail scenes, lower quantizers will be used.
If you use , then you are wasting bits. If you
use , then you are not getting the highest
quality rip. Suppose you rip a DVD at , and
the result is 1800Kbit. If you do a two pass encode with
, the resulting video will have higher quality for the
same bitrate.
Since you are now convinced that two pass is the way to go, the real
question now is what bitrate to use? The answer is that there is no
single answer. Ideally you want to choose a bitrate that yields the
best balance between quality and file size. This is going to vary
depending on the source video.
If size does not matter, a good starting point for a very high quality
rip is about 2000Kbit plus or minus 200Kbit.
For fast action or high detail source video, or if you just have a very
critical eye, you might decide on 2400 or 2600.
For some DVDs, you might not notice a difference at 1400Kbit. It is a
good idea to experiment with scenes at different bitrates to get a feel.
If you aim at a certain size, you will have to somehow calculate the bitrate.
But before that, you need to know how much space you should reserve for the
audio track(s), so you should rip
those first.
You can compute the bitrate with the following equation:
bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) *
1024 * 1024 / length_in_secs * 8 / 1000
For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB
of audio track, the video bitrate will have to be:
(702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000
= 740kbpsConstraints for efficient encoding
Due to the nature of MPEG-type compression, there are various
constraints you should follow for maximal quality.
MPEG splits the video up into 16x16 squares called macroblocks,
each composed of 4 8x8 blocks of luma (intensity) information and two
half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and
the other for the blue-yellow axis).
Even if your movie width and height are not multiples of 16, the
encoder will use enough 16x16 macroblocks to cover the whole picture
area, and the extra space will go to waste.
So in the interests of maximizing quality at a fixed filesize, it is
a bad idea to use dimensions that are not multiples of 16.
Most DVDs also have some degree of black borders at the edges. Leaving
these in place can hurt quality in several ways.
MPEG-type compression is also highly dependent on frequency domain
transformations, in particular the Discrete Cosine Transform (DCT),
which is similar to the Fourier transform. This sort of encoding is
efficient for representing patterns and smooth transitions, but it
has a hard time with sharp edges. In order to encode them it must
use many more bits, or else an artifact known as ringing will
appear.
The frequency transform (DCT) takes place separately on each
macroblock (actually each block), so this problem only applies when
the sharp edge is inside a block. If your black borders begin
exactly at multiple-of-16 pixel boundaries, this is not a problem.
However, the black borders on DVDs rarely come nicely aligned, so
in practice you will always need to crop to avoid this penalty.
In addition to frequency domain transforms, MPEG-type compression uses
motion vectors to represent the change from one frame to the next.
Motion vectors naturally work much less efficiently for new content
coming in from the edges of the picture, because it is not present in
the previous frame. As long as the picture extends all the way to the
edge of the encoded region, motion vectors have no problem with
content moving out the edges of the picture. However, in the presence
of black borders, there can be trouble:
For each macroblock, MPEG-type compression stores a vector
identifying which part of the previous frame should be copied into
this macroblock as a base for predicting the next frame. Only the
remaining differences need to be encoded. If a macroblock spans the
edge of the picture and contains part of the black border, then
motion vectors from other parts of the picture will overwrite the
black border. This means that lots of bits must be spent either
re-blackening the border that was overwritten, or (more likely) a
motion vector will not be used at all and all the changes in this
macroblock will have to be coded explicitly. Either way, encoding
efficiency is greatly reduced.
Again, this problem only applies if black borders do not line up on
multiple-of-16 boundaries.
Finally, suppose we have a macroblock in the interior of the
picture, and an object is moving into this block from near the edge
of the image. MPEG-type coding cannot say "copy the part that is
inside the picture but not the black border." So the black border
will get copied inside too, and lots of bits will have to be spent
encoding the part of the picture that is supposed to be there.
If the picture runs all the way to the edge of the encoded area,
MPEG has special optimizations to repeatedly copy the pixels at the
edge of the picture when a motion vector comes from outside the
encoded area. This feature becomes useless when the movie has black
borders. Unlike problems 1 and 2, aligning the borders at multiples
of 16 does not help here.
Despite the borders being entirely black and never changing, there
is at least a minimal amount of overhead involved in having more
macroblocks.
For all of these reasons, it is recommended to fully crop black
borders. Further, if there is an area of noise/distortion at the edge
of the picture, cropping this will improve encoding efficiency as
well. Videophile purists who want to preserve the original as close as
possible may object to this cropping, but unless you plan to encode at
constant quantizer, the quality you gain from cropping will
considerably exceed the amount of information lost at the edges.
Cropping and Scaling
Recall from the previous section that the final picture size you
encode should be a multiple of 16 (in both width and height).
This can be achieved by cropping, scaling, or a combination of both.
When cropping, there are a few guidelines that must be followed to
avoid damaging your movie.
The normal YUV format, 4:2:0, stores chroma (color) information
subsampled, i.e. chroma is only sampled half as often in each
direction as luma (intensity) information.
Observe this diagram, where L indicates luma sampling points and C
chroma.
LLLLLLLLCCCCLLLLLLLLLLLLLLLLCCCCLLLLLLLL
As you can see, rows and columns of the image naturally come in pairs.
Thus your crop offsets and dimensions must be
even numbers.
If they are not, the chroma will no longer line up correctly with the
luma.
In theory, it is possible to crop with odd offsets, but it requires
resampling the chroma which is potentially a lossy operation and not
supported by the crop filter.
Further, interlaced video is sampled as follows:
Top fieldBottom fieldLLLLLLLLCCCCLLLLLLLLLLLLLLLLCCCCLLLLLLLLLLLLLLLLCCCCLLLLLLLLLLLLLLLLCCCCLLLLLLLL
As you can see, the pattern does not repeat until after 4 lines.
So for interlaced video, your y-offset and height for cropping must
be multiples of 4.
Native DVD resolution is 720x480 for NTSC, and 720x576 for PAL, but
there is an aspect flag that specifies whether it is full-screen (4:3) or
wide-screen (16:9). Many (if not most) widescreen DVDs are not strictly
16:9, and will be either 1.85:1 or 2.35:1 (cinescope). This means that
there will be black bands in the video that will need to be cropped out.
MPlayer provides a crop detection filter that
will determine the crop rectangle ().
Run MPlayer with
and it will print out the crop
settings to remove the borders.
You should let the movie run long enough that the whole picture
area is used, in order to get accurate crop values.
Then, test the values you get with MPlayer,
using the command line which was printed by
, and adjust the rectangle as needed.
The filter can help by allowing you to
interactively position the crop rectangle over your movie.
Remember to follow the above divisibility guidelines so that you
do not misalign the chroma planes.
In certain cases, scaling may be undesirable.
Scaling in the vertical direction is difficult with interlaced
video, and if you wish to preserve the interlacing, you should
usually refrain from scaling.
If you will not be scaling but you still want to use multiple-of-16
dimensions, you will have to overcrop.
Do not undercrop, since black borders are very bad for encoding!
Because MPEG-4 uses 16x16 macroblocks, you will want to make sure that each
dimension of the video you are encoding is a multiple of 16 or else you
will be degrading quality, especially at lower bitrates. You can do this
by rounding the width and height of the crop rectangle down to the nearest
multiple of 16.
As stated earlier, when cropping, you will want to increase the Y offset by
half the difference of the old and the new height so that the resulting
video is taken from the center of the frame. And because of the way DVD
video is sampled, make sure the offset is an even number. (In fact, as a
rule, never use odd values for any parameter when you are cropping and
scaling video.) If you are not comfortable throwing a few extra pixels
away, you might prefer instead to scale the video instead. We will look
at this in our example below.
You can actually let the filter do all of the
above for you, as it has an optional parameter that
is equal to 16 by default.
Also, be careful about "half black" pixels at the edges. Make sure you
crop these out too, or else you will be wasting bits there that
are better spent elsewhere.
After all is said and done, you will probably end up with video whose pixels
are not quite 1.85:1 or 2.35:1, but rather something close to that. You
could calculate the new aspect ratio manually, but
MEncoder offers an option for libavcodec called
that will do this for you. Absolutely do not scale this video up in order to
square the pixels unless you like to waste your hard disk space. Scaling
should be done on playback, and the player will use the aspect stored in
the AVI to determine the correct resolution.
Unfortunately, not all players enforce this auto-scaling information,
therefore you may still want to rescale.
Choosing resolution and bitrate
If you will not be encoding in constant quantizer mode, you need to
select a bitrate.
The concept of bitrate is quite simple.
It is the (average) number of bits that will be consumed to store your
movie, per second.
Normally bitrate is measured in kilobits (1000 bits) per second.
The size of your movie on disk is the bitrate times the length of the
movie in time, plus a small amount of "overhead" (see the section on
the AVI container
for instance).
Other parameters such as scaling, cropping, etc. will
not alter the file size unless you
change the bitrate as well!.
Bitrate does not scale proportionally
to resolution.
That is to say, a 320x240 file at 200 kbit/sec will not be the same
quality as the same movie at 640x480 and 800 kbit/sec!
There are two reasons for this:
Perceptual: You notice MPEG
artifacts more if they are scaled up bigger!
Artifacts appear on the scale of blocks (8x8).
Your eye will not see errors in 4800 small blocks as easily as it
sees errors in 1200 large blocks (assuming you will be scaling both
to fullscreen).
Theoretical: When you scale down
an image but still use the same size (8x8) blocks for the frequency
space transform, you move more data to the high frequency bands.
Roughly speaking, each pixel contains more of the detail than it
did before.
So even though your scaled-down picture contains 1/4 the information
in the spacial directions, it could still contain a large portion
of the information in the frequency domain (assuming that the high
frequencies were underutilized in the original 640x480 image).
Past guides have recommended choosing a bitrate and resolution based
on a "bits per pixel" approach, but this is usually not valid due to
the above reasons.
A better estimate seems to be that bitrates scale proportional to the
square root of resolution, so that 320x240 and 400 kbit/sec would be
comparable to 640x480 at 800 kbit/sec.
However this has not been verified with theoretical or empirical
rigor.
Further, given that movies vary greatly with regard to noise, detail,
degree of motion, etc., it is futile to make general recommendations
for bits per length-of-diagonal (the analog of bits per pixel,
using the square root).
So far we have discussed the difficulty of choosing a bitrate and
resolution.
Computing the resolution
The following steps will guide you in computing the resolution of your
encode without distorting the video too much, by taking into account several
types of information about the source video.
First, you should compute the encoded aspect ratio:
ARc = (Wc x (ARa / PRdvd )) / Hcwhere:
Wc and Hc are the width and height of the cropped video,
ARa is the displayed aspect ratio, which usually is 4/3 or 16/9,
PRdvd is the pixel ratio of the DVD which is equal to 1.25=(720/576) for PAL
DVDs and 1.5=(720/480) for NTSC DVDs,
Then, you can compute the X and Y resolution, according to a certain
Compression Quality (CQ) factor:
ResY = INT(SQRT( 1000*Bitrate/25/ARc/CQ )/16) * 16
and
ResX = INT( ResY * ARc / 16) * 16
Okay, but what is the CQ?
The CQ represents the number of bits per pixel and per frame of the encode.
Roughly speaking, the greater the CQ, the less the likelihood to see
encoding artifacts.
However, if you have a target size for your movie (1 or 2 CDs for instance),
there is a limited total number of bits that you can spend; therefore it is
necessary to find a good tradeoff between compressibility and quality.
The CQ depends on the bitrate, the video codec efficiency and the
movie resolution.
In order to raise the CQ, typically you would downscale the movie given that the
bitrate is computed in function of the target size and the length of the
movie, which are constant.
With MPEG-4 ASP codecs such as XviD
and libavcodec, a CQ below 0.18
usually results in a pretty blocky picture, because there
are not enough bits to code the information of each macroblock. (MPEG4, like
many other codecs, groups pixels by blocks of several pixels to compress the
image; if there are not enough bits, the edges of those blocks are
visible.)
It is therefore wise to take a CQ ranging from 0.20 to 0.22 for a 1 CD rip,
and 0.26-0.28 for 2 CDs rip with standard encoding options.
More advanced encoding options such as those listed here for
libavcodec
and
XviD
should make it possible to get the same quality with CQ ranging from
0.18 to 0.20 for a 1 CD rip, and 0.24 to 0.26 for a 2 CD rip.
With MPEG-4 ASP codecs such as x264,
you can use a CQ ranging from 0.14 to 0.16 with standard encoding options,
and should be able to go as low as 0.10 to 0.12 with
x264's advanced encoding settings.
Please take note that the CQ is just an indicative figure, as depending on
the encoded content, a CQ of 0.18 may look just fine for a Bergman, contrary
to a movie such as The Matrix, which contains many high-motion scenes.
On the other hand, it is worthless to raise CQ higher than 0.30 as you would
be wasting bits without any noticeable quality gain.
Also note that as mentioned earlier in this guide, low resolution videos
need a bigger CQ (compared to, for instance, DVD resolution) to look good.
Filtering
Learning how to use MEncoder's video filters
is essential to producing good encodes.
All video processing is performed through the filters -- cropping,
scaling, color adjustment, noise removal, sharpening, deinterlacing,
telecine, inverse telecine, and deblocking, just to name a few.
Along with the vast number of supported input formats, the variety of
filters available in MEncoder is one of its
main advantages over other similar programs.
Filters are loaded in a chain using the -vf option:
-vf filter1=options,filter2=options,...
Most filters take several numeric options separated by colons, but
the syntax for options varies from filter to filter, so read the man
page for details on the filters you wish to use.
Filters operate on the video in the order they are loaded.
For example, the following chain:
-vf crop=688:464:12:4,scale=640:464
will first crop the 688x464 region of the picture with upper-left
corner at (12,4), and then scale the result down to 640x464.
Certain filters need to be loaded at or near the beginning of the
filter chain, in order to take advantage of information from the
video decoder that will be lost or invalidated by other filters.
The principal examples are (postprocessing, only
when it is performing deblock or dering operations),
(another postprocessor to remove MPEG artifacts),
(inverse telecine), and
(for converting soft telecine to hard
telecine).
In general, you want to do as little filtering as possible to the movie
in order to remain close to the original DVD source. Cropping is often
necessary (as described above), but avoid to scale the video. Although
scaling down is sometimes preferred to using higher quantizers, we want
to avoid both these things: remember that we decided from the start to
trade bits for quality.
Also, do not adjust gamma, contrast, brightness, etc. What looks good
on your display may not look good on others. These adjustments should
be done on playback only.
One thing you might want to do, however, is pass the video through a
very light denoise filter, such as .
Again, it is a matter of putting those bits to better use: why waste them
encoding noise when you can just add that noise back in during playback?
Increasing the parameters for will further
improve compressibility, but if you increase the values too much, you
risk degrading the image visibily. The suggested values above
() are quite conservative; you should feel free to
experiment with higher values and observe the results for yourself.
Interlacing and Telecine
Almost all movies are shot at 24 fps. Because NTSC is 30000/1001 fps, some
processing must be done to this 24 fps video to make it run at the correct
NTSC framerate. The process is called 3:2 pulldown, commonly referred to
as telecine (because pulldown is often applied during the telecine
process), and, naively described, it works by slowing the film down to
24000/1001 fps, and repeating every fourth frame.
No special processing, however, is done to the video for PAL DVDs, which
run at 25 fps. (Technically, PAL can be telecined, called 2:2 pulldown,
but this does not become an issue in practice.) The 24 fps film is simply
played back at 25 fps. The result is that the movie runs slightly faster,
but unless you are an alien, you probably will not notice the difference.
Most PAL DVDs have pitch-corrected audio, so when they are played back at
25 fps things will sound right, even though the audio track (and hence the
whole movie) has a running time that is 4% less than NTSC DVDs.
Because the video in a PAL DVD has not been altered, you need not worry
much about framerate. The source is 25 fps, and your rip will be 25
fps. However, if you are ripping an NTSC DVD movie, you may need to
apply inverse telecine.
For movies shot at 24 fps, the video on the NTSC DVD is either telecined
30000/1001, or else it is progressive 24000/1001 fps and intended to be telecined
on-the-fly by a DVD player. On the other hand, TV series are usually
only interlaced, not telecined. This is not a hard rule: some TV series
are interlaced (such as Buffy the Vampire Slayer) whereas some are a
mixture of progressive and interlaced (such as Angel, or 24).
It is highly recommended that you read the section on
How to deal with telecine and interlacing in NTSC DVDs
to learn how to handle the different possibilities.
However, if you are mostly just ripping movies, likely you are either
dealing with 24 fps progressive or telecined video, in which case you can
use the filter .
Encoding interlaced video
If the movie you want to encode is interlaced (NTSC video or
PAL video), you will need to choose whether you want to
deinterlace or not.
While deinterlacing will make your movie usable on progressive
scan displays such a computer monitors and projectors, it comes
at a cost: The fieldrate of 50 or 60000/1001 fields per second
is halved to 25 or 30000/1001 frames per second, and roughly half of
the information in your movie will be lost during scenes with
significant motion.
Therefore, if you are encoding for high quality archival purposes,
it is recommended not to deinterlace.
You can always deinterlace the movie at playback time when
displaying it on progressive scan devices, and future players will
be able to deinterlace to full fieldrate, interpolating 50 or
60000/1001 entire frames per second from the interlaced video.
Special care must be taken when working with interlaced video:
Crop height and y-offset must be multiples of 4.
Any vertical scaling must be performed in interlaced mode.
Postprocessing and denoising filters may not work as expected
unless you take special care to operate them a field at a time,
and they may damage the video if used incorrectly.
With these things in mind, here is our first example:
mencoder capture.avi -mc 0 -oac lavc -ovc lavc -lavcopts \
vcodec=mpeg2video:vbitrate=6000:ilme:ildct:acodec=mp2:abitrate=224
Note the and options.
Notes on Audio/Video synchronizationMEncoder's audio/video synchronization
algorithms were designed with the intention of recovering files with
broken sync.
However, in some cases they can cause unnecessary skipping and duplication of
frames, and possibly slight A/V desync, when used with proper input
(of course, A/V sync issues apply only if you process or copy the
audio track while transcoding the video, which is strongly encouraged).
Therefore, you may have to switch to basic A/V sync with
the option, or put this in your
~/.mplayer/mencoder config file, as long as
you are only working with good sources (DVD, TV capture, high quality
MPEG-4 rips, etc) and not broken ASF/RM/MOV files.
If you want to further guard against strange frame skips and
duplication, you can use both and
.
This will prevent all A/V sync, and copy frames
one-to-one, so you cannot use it if you will be using any filters that
unpredictably add or drop frames, or if your input file has variable
framerate!
Therefore, using is not in general recommended.
The so-called "three-pass" audio encoding which MEncoder
supports has been reported to cause A/V desync.
This will definitely happen if it is used in conjunction with certain
filters, therefore, it is now recommended not to
use three-pass audio mode.
This feature is only left for compatibility purposes and for expert
users who understand when it is safe to use and when it is not.
If you have never heard of three-pass mode before, forget that we
even mentioned it!
There have also been reports of A/V desync when encoding from stdin
with MEncoder.
Do not do this! Always use a file or CD/DVD/etc device as input.
Audio
Audio is a much simpler problem to solve: if you care about quality, just
leave it as is.
Even AC3 5.1 streams are at most 448Kbit/s, and they are worth every bit.
You might be tempted to transcode the audio to high quality Vorbis, but
just because you do not have an A/V receiver for AC3 pass-through today
does not mean you will not have one tomorrow. Future-proof your DVD rips by
preserving the AC3 stream.
You can keep the AC3 stream either by copying it directly into the video
stream during the encoding.
You can also extract the AC3 stream in order to mux it into containers such
as NUT or Matroska.
mplayer source_file.vob -aid 129 -dumpaudio -dumpfile sound.ac3
will dump into the file sound.ac3 the
audio track number 129 from the file
source_file.vob (NB: DVD VOB files
usually use a different audio numbering,
which means that the VOB audio track 129 is the 2nd audio track of the file).
But sometimes you truly have no choice but to further compress the
sound so that more bits can be spent on the video.
Most people choose to compress audio with either MP3 or Vorbis audio
codecs.
While the latter is a very space-efficient codec, MP3 is better supported
by hardware players, although this trend is changing.
Do not use when encoding
a file with audio, even if you will be encoding and muxing audio
separately later.
Though it may work in ideal cases, using is
likely to hide some problems in your encoding command line setting.
In other words, having a soundtrack during your encode assures you that,
provided you do not see messages such as
Too many audio packets in the buffer, you will be able
to get proper sync.
You need to have MEncoder process the sound.
You can for example copy the orignal soundtrack during the encode with
or convert it to a "light" 4 kHz mono WAV
PCM with .
Otherwise, in some cases, it will generate a video file that will not sync
with the audio.
Such cases are when the number of video frames in the source file does
not match up to the total length of audio frames or whenever there
are discontinuities/splices where there are missing or extra audio frames.
The correct way to handle this kind of problem is to insert silence or
cut audio at these points.
However MPlayer cannot do that, so if you
demux the AC3 audio and encode it with a separate app (or dump it to PCM with
MPlayer), the splices will be left incorrect
and the only way to correct them is to drop/dup video frames at the
splice.
As long as MEncoder sees the audio when it is
encoding the video, it can do this dropping/duping (which is usually OK
since it takes place at full black/scenechange, but if
MEncoder cannot see the audio, it will just
process all frames as-is and they will not fit the final audio stream when
you for example merge your audio and video track into a Matroska file.
First of all, you will have to convert the DVD sound into a WAV file that the
audio codec can use as input.
For example:
mplayer source_file.vob -ao pcm:file=destination_sound.wav -vc dummy -aid 1 -vo null
will dump the second audio track from the file
source_file.vob into the file
destination_sound.wav.
You may want to normalize the sound before encoding, as DVD audio tracks
are commonly recorded at low volumes.
You can use the tool normalize for instance,
which is available in most distributions.
If you are using Windows, a tool such as BeSweet
can do the same job.
You will compress in either Vorbis or MP3.
For example:
oggenc -q1 destination_sound.wav
will encode destination_sound.wav with
the encoding quality 1, which is roughly equivalent to 80Kb/s, and
is the minimum quality at which you should encode if you care about
quality.
Please note that MEncoder currently cannot mux Vorbis audio tracks
into the output file because it only supports AVI and MPEG
containers as an output, each of which may lead to audio/video
playback synchronization problems with some players when the AVI file
contain VBR audio streams such as Vorbis.
Do not worry, this document will show you how you can do that with third
party programs.
Muxing
Now that you have encoded your video, you will most likely want
to mux it with one or more audio tracks into a movie container, such
as AVI, MPEG, Matroska or NUT.
MEncoder is currently only able to natively output
audio and video into MPEG and AVI container formats.
for example:
mencoder -oac copy -ovc copy -o output_movie.avi -audiofile input_audio.mp2input_video.avi
This would merge the video file input_video.avi
and the audio file input_audio.mp2
into the AVI file output_movie.avi.
This command works with MPEG-1 layer I, II and III (more commonly known
as MP3) audio, WAV and a few other audio formats too.
MEncoder features experimental support for
libavformat, which is a
library from the FFmpeg project that supports muxing and demuxing
a variety of containers.
For example:
mencoder -oac copy -ovc copy -o output_movie.asf -audiofile input_audio.mp2input_video.avi -of lavf -lavfopts format=asf
This will do the same thing as the previous example, except that
the output container will be ASF.
Please note that this support is highly experimental (but getting
better every day), and will only work if you compiled
MPlayer with the support for
libavformat enabled (which
means that a pre-packaged binary version will not work in most cases).
Improving muxing and A/V sync reliability
You may experience some serious A/V sync problems while trying to mux
your video and some audio tracks, where no matter how you adjust the
audio delay, you will never get proper sync.
That may happen when you use some video filters that will drop or
duplicate some frames, like the inverse telecine filters.
It is strongly encouraged to append the video
filter at the end of the filter chain to avoid this kind of problem.
Without , if MEncoder
wants to duplicate a frame, it relies on the muxer to put a mark on the
container so that the last frame will be displayed again to maintain
sync while writing no actual frame.
With , MEncoder
will instead just push the last frame displayed again into the filter
chain.
This means that the encoder receives the exact
same frame twice, and compresses it.
This will result in a slightly bigger file, but will not cause problems
when demuxing or remuxing into other container formats.
You may also have no choice but to use with
container formats that are not too tightly linked with
MEncoder such as the ones supported through
libavformat, which may not
support frame duplication at the container level.
Limitations of the AVI container
Although it is the most widely-supported container format after MPEG-1,
AVI also has some major drawbacks.
Perhaps the most obvious is the overhead.
For each chunk of the AVI file, 24 bytes are wasted on headers and
index.
This translates into a little over 5 MB per hour, or 1-2.5%
overhead for a 700 MB movie. This may not seem like much, but it could
mean the difference between being able to use 700 kbit/sec video or
714 kbit/sec, and every bit of quality counts.
In addition this gross inefficiency, AVI also has the following major
limitations:
Only fixed-fps content can be stored. This is particularly limiting
if the original material you want to encode is mixed content, for
example a mix of NTSC video and film material.
Actually there are hacks that can be used to store mixed-framerate
content in AVI, but they increase the (already huge) overhead
fivefold or more and so are not practical.
Audio in AVI files must be either constant-bitrate (CBR) or
constant-framesize (i.e. all frames decode to the same number of
samples).
Unfortunately, the most efficient codec, Vorbis, does not meet
either of these requirements.
Therefore, if you plan to store your movie in AVI, you will have to
use a less efficient codec such as MP3 or AC3.
Having said all that, MEncoder does not
currently support variable-fps output or Vorbis encoding.
Therefore, you may not see these as limitations if
MEncoder is the
only tool you will be using to produce your encodes.
However, it is possible to use MEncoder
only for video encoding, and then use external tools to encode
audio and mux it into another container format.
Muxing into the Matroska container
Matroska is a free, open standard container format, aiming
to offer a lot of advanced features, which older containers
like AVI cannot handle.
For example, Matroska supports variable bitrate audio content
(VBR), variable framerates (VFR), chapters, file attachments,
error detection code (EDC) and modern A/V Codecs like "Advanced Audio
Coding" (AAC), "Vorbis" or "MPEG-4 AVC" (H.264), next to nothing
handled by AVI.
The tools required to create Matroska files are collectively called
mkvtoolnix, and are available for most
Unix platforms as well as Windows.
Because Matroska is an open standard you may find other
tools that suit you better, but since mkvtoolnix is the most
common, and is supported by the Matroska team itself, we will
only cover its usage.
Probably the easiest way to get started with Matroska is to use
MMG, the graphical frontend shipped with
mkvtoolnix, and follow the
guide to mkvmerge GUI (mmg)
You may also mux audio and video files using the command line:
mkvmerge -o output.mkvinput_video.aviinput_audio1.mp3input_audio2.ac3
This would merge the video file input_video.avi
and the two audio files input_audio1.mp3
and input_audio2.ac3 into the Matroska
file output.mkv.
Matroska, as mentioned earlier, is able to do much more than that, like
multiple audio tracks (including fine-tuning of audio/video
synchronization), chapters, subtitles, splitting, etc...
Please refer to the documentation of those applications for
more details.
How to deal with telecine and interlacing within NTSC DVDsIntroductionWhat is telecine?
I suggest you visit this page if you do not understand much of what
is written in this document:
http://www.divx.com/support/guides/guide.php?gid=10
This URL links to an understandable and reasonably comprehensive
description of what telecine is.
A note about the numbers.
Many documents, including the guide linked above, refer to the fields
per second value of NTSC video as 59.94 and the corresponding frames
per second values as 29.97 (for telecined and interlaced) and 23.976
(for progressive). For simplicity, some documents even round these
numbers to 60, 30, and 24.
Strictly speaking, all those numbers are approximations. Black and
white NTSC video was exactly 60 fields per second, but 60000/1001
was later chosen to accomodate color data while remaining compatible
with contemporary black and white televisions. Digital NTSC video
(such as on a DVD) is also 60000/1001 fields per second. From this,
interlaced and telecined video are derived to be 30000/1001 frames
per second; progressive video is 24000/1001 frames per second.
Older versions of the MEncoder documentation
and many archived mailing list posts refer to 59.94, 29.97, and 23.976.
All MEncoder documentation has been updated
to use the fractional values, and you should use them too.
is incorrect.
should be used instead.
How telecine is used.
All video intended to be displayed on an NTSC
television set must be 60000/1001 fields per second. Made-for-TV movies
4 and shows are often filmed directly at 60000/1001 fields per second, but
the majority of cinema is filmed at 24 or 24000/1001 frames per
second. When cinematic movie DVDs are mastered, the video is then
converted for television using a process called telecine.
On a DVD, the video is never actually stored as 60000/1001 fields per
second. For video that was originally 60000/1001, each pair of fields is
combined to form a frame, resulting in 30000/1001 frames per
second. Hardware DVD players then read a flag embedded in the video
stream to determine whether the odd- or even-numbered lines should
form the first field.
Usually, 24000/1001 frames per second content stays as it is when
encoded for a DVD, and the DVD player must perform telecining
on-the-fly. Sometimes, however, the video is telecined
before being stored on the DVD; even though it
was originally 24000/1001 frames per second, it becomes 60000/1001 fields per
second. When it is stored on the DVD, pairs of fields are combined to form
30000/1001 frames per second.
When looking at individual frames formed from 60000/10001 fields per
second video, telecined or otherwise, interlacing is clearly visible
wherever there is any motion, because one field (say, the
even-numbered lines) represents a moment in time 1/(60000/1001)
seconds later than the other. Playing interlaced video on a computer
looks ugly both because the monitor is higher resolution and because
the video is shown frame-after-frame instead of field-after-field.
Notes:
This section only applies to NTSC DVDs, and not PAL.
The example MEncoder lines throughout the
document are not intended for
actual use. They are simply the bare minimum required to encode the
pertaining video category. How to make good DVD rips or fine-tune
libavcodec for maximal
quality is not within the scope of this document.
There are a couple footnotes specific to this guide, linked like this:
[1]
How to tell what type of video you haveProgressive
Progressive video was originally filmed at 24000/1001 fps, and stored
on the DVD without alteration.
When you play a progressive DVD in MPlayer,
MPlayer will print the following line as
soon as the movie begins to play:
demux_mpg: 24000/1001 fps progressive NTSC content detected, switching framerate.
From this point forward, demux_mpg should never say it finds
"30000/1001 fps NTSC content."
When you watch progressive video, you should never see any
interlacing. Beware, however, because sometimes there is a tiny bit
of telecine mixed in where you would not expect. I have encountered TV
show DVDs that have one second of telecine at every scene change, or
at seemingly random places. I once watched a DVD that had a
progressive first half, and the second half was telecined. If you
want to be really thorough, you can scan the
entire movie:
mplayer dvd://1 -nosound -vo null -benchmark
Using makes
MPlayer play the movie as quickly as it
possibly can; still, depending on your hardware, it can take a
while. Every time demux_mpg reports a framerate change, the line
immediately above will show you the time at which the change
occurred.
Sometimes progressive video on DVDs is referred to as
"soft-telecine" because it is intended to
be telecined by the DVD player.
Telecined
Telecined video was originally filmed at 24000/1001, but was telecined
before it was written to the DVD.
MPlayer does not (ever) report any
framerate changes when it plays telecined video.
Watching a telecined video, you will see interlacing artifacts that
seem to "blink": they repeatedly appear and disappear.
You can look closely at this by
mplayer dvd://1
Seek to a part with motion.
Use the . key to step forward one frame at a time.
Look at the pattern of interlaced-looking and progressive-looking
frames. If the pattern you see is PPPII,PPPII,PPPII,... then the
video is telecined. If you see some other pattern, then the video
may have been telecined using some non-standard method;
MEncoder cannot losslessly convert
non-standard telecine to progressive. If you do not see any
pattern at all, then it is most likely interlaced.
Sometimes telecined video on DVDs is referred to as
"hard-telecine". Since hard-telecine is already 60000/1001 fields
per second, the DVD player plays the video without any manipulation.
Another way to tell if your source is telecined or not is to play
the source with the and
command line options to see how matches frames.
If the source is telecined, you should see on the console a 3:2 pattern
with 0+.1.+2 and 0++1
alternating.
This technique has the advantage that you do not need to watch the
source to identify it, which could be useful if you wish to automate
the encoding procedure, or to carry out said procedure remotely via
a slow connection.
Interlaced
Interlaced video was originally filmed at 60000/1001 fields per second,
and stored on the DVD as 30000/1001 frames per second. The interlacing effect
(often called "combing") is a result of combining pairs of
fields into frames. Each field is supposed to be 1/(60000/1001) seconds apart,
and when they are displayed simultaneously the difference is apparent.
As with telecined video, MPlayer should
not ever report any framerate changes when playing interlaced content.
When you view an interlaced video closely by frame-stepping with the
. key, you will see that every single frame is interlaced.
Mixed progressive and telecine
All of a "mixed progressive and telecine" video was originally
24000/1001 frames per second, but some parts of it ended up being telecined.
When MPlayer plays this category, it will
(often repeatedly) switch back and forth between "30000/1001 fps NTSC"
and "24000/1001 fps progressive NTSC". Watch the bottom of
MPlayer's output to see these messages.
You should check the "30000/1001 fps NTSC" sections to make sure
they are actually telecine, and not just interlaced.
Mixed progressive and interlaced
In "mixed progressive and interlaced" content, progressive
and interlaced video have been spliced together.
This category looks just like "mixed progressive and telecine",
until you examine the 30000/1001 fps sections and see that they do not have the
telecine pattern.
How to encode each category
As I mentioned in the beginning, example MEncoder
lines below are not meant to actually be used;
they only demonstrate the minimum parameters to properly encode each category.
Progressive
Progressive video requires no special filtering to encode. The only
parameter you need to be sure to use is
. Otherwise, MEncoder
will try to encode at 30000/1001 fps and will duplicate frames.
mencoder dvd://1 -oac copy -ovc lavc -ofps 24000/1001
It is often the case, however, that a video that looks progressive
actually has very short parts of telecine mixed in. Unless you are
sure, it is safest to treat the video as
mixed progressive and telecine.
The performance loss is small
[3].
Telecined
Telecine can be reversed to retrieve the original 24000/1001 content,
using a process called inverse-telecine.
MPlayer contains several filters to
accomplish this; the best filter, , is described
in the mixed
progressive and telecine section.
Interlaced
For most practical cases it is not possible to retrieve a complete
progressive video from interlaced content. The only way to do so
without losing half of the vertical resolution is to double the
framerate and try to "guess" what ought to make up the
corresponding lines for each field (this has drawbacks - see method
3).
Encode the video in interlaced form. Normally, interlacing wreaks
havoc with the encoder's ability to compress well, but
libavcodec has two
parameters specifically for dealing with storing interlaced video a
bit better: and . Also,
using is strongly recommended
[2] because it
will encode macroblocks as non-interlaced in places where there is
no motion. Note that is NOT needed here.
mencoder dvd://1 -oac copy -ovc lavc -lavcopts ildct:ilme:mbd=2
Use a deinterlacing filter before encoding. There are several of
these filters available to choose from, each with its own advantages
and disadvantages. Consult to see
what is available (grep for "deint"), and search the
MPlayer mailing lists to find many discussions about the
various filters. Again, the framerate is not changing, so no
. Also, deinterlacing should be done after
cropping [1] and
before scaling.
mencoder dvd://1 -oac copy -vf pp=lb -ovc lavc
Unfortunately, this option is buggy with
MEncoder; it ought to work well with
MEncoder G2, but that is not here yet. You
might experience crahes. Anyway, the purpose of is to create a full frame out of each field, which
makes the framerate 60000/1001. The advantage of this approach is that no
data is ever lost; however, since each frame comes from only one
field, the missing lines have to be interpolated somehow. There are
no very good methods of generating the missing data, and so the
result will look a bit similar to when using some deinterlacing
filters. Generating the missing lines creates other issues, as well,
simply because the amount of data doubles. So, higher encoding
bitrates are required to maintain quality, and more CPU power is
used for both encoding and decoding. tfields has several different
options for how to create the missing lines of each frame. If you
use this method, then Reference the manual, and chose whichever
option looks best for your material. Note that when using
you
have to specify both
and to be twice the
framerate of your original source.
mencoder dvd://1 -oac copy -vf tfields=2 -ovc lavc -fps 60000/1001 -ofps 60000/1001
If you plan on downscaling dramatically, you can extract and encode
only one of the two fields. Of course, you will lose half the vertical
resolution, but if you plan on downscaling to at most 1/2 of the
original, the loss will not matter much. The result will be a
progressive 30000/1001 frames per second file. The procedure is to use
, then crop
[1] and scale
appropriately. Remember that you will have to adjust the scale to
compensate for the vertical resolution being halved.
mencoder dvd://1 -oac copy -vf field=0 -ovc lavcMixed progressive and telecine
In order to turn mixed progressive and telecine video into entirely
progressive video, the telecined parts have to be
inverse-telecined. There are three ways to accomplish this,
described below. Note that you should
always inverse-telecine before any
rescaling; unless you really know what you are doing,
inverse-telecine before cropping, too
[1].
is needed here because the output video
will be 24000/1001 frames per second.
is designed to inverse-telecine
telecined material while leaving progressive data alone. In order to
work properly, must
be followed by the filter or
else MEncoder will crash.
is, however, the cleanest and most
accurate method available for encoding both telecine and
"mixed progressive and telecine".
mencoder dvd://1 -oac copy -vf pullup,softskip -ovc lavc -ofps 24000/1001
An older method
is to, rather than inverse-telecine the telecined parts, telecine
the non-telecined parts and then inverse-telecine the whole
video. Sound confusing? softpulldown is a filter that goes through
a video and makes the entire file telecined. If we follow
softpulldown with either or
, the final result will be entirely
progressive. is needed.
mencoder dvd://1 -oac copy -vf softpulldown,ivtc=1 -ovc lavc -ofps 24000/1001
I have not used myself, but here is what
D Richard Felker III has to say:
It is OK, but IMO it tries to deinterlace rather
than doing inverse telecine too often (much like settop DVD
players & progressive TVs) which gives ugly flickering and
other artifacts. If you are going to use it, you at least need to
spend some time tuning the options and watching the output first
to make sure it is not messing up.
Mixed progressive and interlaced
There are two options for dealing with this category, each of
which is a compromise. You should decide based on the
duration/location of each type.
Treat it as progressive. The interlaced parts will look interlaced,
and some of the interlaced fields will have to be dropped, resulting
in a bit of uneven jumpiness. You can use a postprocessing filter if
you want to, but it may slightly degrade the progressive parts.
This option should definitely not be used if you want to eventually
display the video on an interlaced device (with a TV card, for
example). If you have interlaced frames in a 24000/1001 frames per
second video, they will be telecined along with the progressive
frames. Half of the interlaced "frames" will be displayed for three
fields' duration (3/(60000/1001) seconds), resulting in a flicking
"jump back in time" effect that looks quite bad. If you
even attempt this, you must use a
deinterlacing filter like or
.
It may also be a bad idea for progressive display, too. It will drop
pairs of consecutive interlaced fields, resulting in a discontinuity
that can be more visible than with the second method, which shows
some progressive frames twice. 30000/1001 frames per second interlaced
video is already a bit choppy because it really should be shown at
60000/1001 fields per second, so the duplicate frames do not stand out as
much.
Either way, it is best to consider your content and how you intend to
display it. If your video is 90% progressive and you never intend to
show it on a TV, you should favor a progressive approach. If it is
only half progressive, you probably want to encode it as if it is all
interlaced.
Treat it as interlaced. Some frames of the progressive parts will
need to be duplicated, resulting in uneven jumpiness. Again,
deinterlacing filters may slightly degrade the progressive parts.
FootnotesAbout cropping:
Video data on DVDs are stored in a format called YUV 4:2:0. In YUV
video, luma ("brightness") and chroma ("color")
are stored separately. Because the human eye is somewhat less
sensitive to color than it is to brightness, in a YUV 4:2:0 picture
there is only one chroma pixel for every four luma pixels. In a
progressive picture, each square of four luma pixels (two on each
side) has one common chroma pixel. You must crop progressive YUV
4:2:0 to even resolutions, and use even offsets. For example,
is OK but
is not.
When you are dealing with interlaced YUV 4:2:0, the situation is a
bit more complicated. Instead of every four luma pixels in the
frame sharing a chroma pixel, every four luma
pixels in each field share a chroma
pixel. When fields are interlaced to form a frame, each scanline is
one pixel high. Now, instead of all four luma pixels being in a
square, there are two pixels side-by-side, and the other two pixels
are side-by-side two scanlines down. The two luma pixels in the
intermediate scanline are from the other field, and so share a
different chroma pixel with two luma pixels two scanlines away. All
this confusion makes it necessary to have vertical crop dimensions
and offsets be multiples of four. Horizontal can stay even.
For telecined video, I recommend that cropping take place after
inverse telecining. Once the video is progressive you only need to
crop by even numbers. If you really want to gain the slight speedup
that cropping first may offer, you must crop vertically by multiples
of four or else the inverse-telecine filter will not have proper data.
For interlaced (not telecined) video, you must always crop
vertically by multiples of four unless you use before cropping.
About encoding parameters and quality:
Just because I recommend here does not mean it
should not be used elsewhere. Along with ,
is one of the two
libavcodec options that
increases quality the most, and you should always use at least those
two unless the drop in encoding speed is prohibitive (e.g. realtime
encoding). There are many other options to
libavcodec that increase
encoding quality (and decrease encoding speed) but that is beyond
the scope of this document.
About the performance of pullup:
It is safe to use (along with ) on progressive video, and is usually a good idea unless
the source has been definitively verified to be entirely progressive.
The performace loss is small for most cases. On a bare-minimum encode,
causes MEncoder to
be 50% slower. Adding sound processing and advanced overshadows that difference, bringing the performance
decrease of using down to 2%.
Encoding with the libavcodec
codec familylibavcodec
provides simple encoding to a lot of interesting video and audio formats.
You can encode to the following codecs (more or less up to date):
libavcodec's video codecsVideo codec nameDescriptionmjpeg
Motion JPEG
ljpeg
lossless JPEG
h261
H.261
h263
H.263
h263p
H.263+
mpeg4
ISO standard MPEG-4 (DivX 5, XviD compatible)
msmpeg4
pre-standard MPEG-4 variant by MS, v3 (AKA DivX3)
msmpeg4v2
pre-standard MPEG-4 by MS, v2 (used in old ASF files)
wmv1
Windows Media Video, version 1 (AKA WMV7)
wmv2
Windows Media Video, version 2 (AKA WMV8)
rv10
RealVideo 1.0
rv20
RealVideo 2.0
mpeg1video
MPEG-1 video
mpeg2video
MPEG-2 video
huffyuv
lossless compression
asv1
ASUS Video v1
asv2
ASUS Video v2
ffv1
FFmpeg's lossless video codec
svq1
Sorenson video 1
flv
Sorenson H.263 used in Flash Video
dvvideo
Sony Digital Video
snow
FFmpeg's experimental wavelet-based codec
The first column contains the codec names that should be passed after the
vcodec config, like:
An example with MJPEG compression:
mencoder dvd://2 -o title2.avi -ovc lavc -lavcopts vcodec=mjpeg -oac copylibavcodec's audio codecsAudio codec nameDescriptionmp2MPEG Layer 2ac3AC3, AKA Dolby Digitaladpcm_ima_wavIMA adaptive PCM (4 bits per sample, 4:1 compression)sonicexperimental lossy/lossless codec
The first column contains the codec names that should be passed after the
acodec option, like:
An example with AC3 compression:
mencoder dvd://2 -o title2.avi -oac lavc -lavcopts acodec=ac3 -ovc copy
Contrary to libavcodec's video
codecs, its audio codecs do not make a wise usage of the bits they are
given as they lack some minimal psychoacoustic model (if at all)
which most other codec implementations feature.
However, note that all these audio codecs are very fast and work
out-of-the-box everywhere MEncoder has been
compiled with libavcodec (which
is the case most of time), and do not depend on external libraries.
Encoding options of libavcodec
Ideally, you would probably want to be able to just tell the encoder to switch
into "high quality" mode and move on.
That would probably be nice, but unfortunately hard to implement as different
encoding options yield different quality results depending on the source material.
That is because compression depends on the visual properties of the video
in question.
For example, anime and live action have very different properties and
thus require different options to obtain optimum encoding.
The good news is that some options should never be left out, like
, , and .
See below for a detailed description of common encoding options.
Options to adjust:vmax_b_frames: 1 or 2 is good, depending on
the movie.
Note that if you need to have your encode be decodable by DivX5, you
need to activate closed GOP support, using
libavcodec's
option, but you need to deactivate scene detection, which
is not a good idea as it will hurt encode efficiency a bit.
vb_strategy=1: helps in high-motion scenes.
On some videos, vmax_b_frames may hurt quality, but vmax_b_frames=2 along
with vb_strategy=1 helps.
dia: motion search range. Bigger is better
and slower.
Negative values are a completely different scale.
Good values are -1 for a fast encode, or 2-4 for slower.
predia: motion search pre-pass.
Not as important as dia. Good values are 1 (default) to 4. Requires preme=2
to really be useful.
cmp, subcmp, precmp: Comparison function for
motion estimation.
Experiment with values of 0 (default), 2 (hadamard), 3 (dct), and 6 (rate
distortion).
0 is fastest, and sufficient for precmp.
For cmp and subcmp, 2 is good for anime, and 3 is good for live action.
6 may or may not be slightly better, but is slow.
last_pred: Number of motion predictors to
take from the previous frame.
1-3 or so help at little speed cost.
Higher values are slow for no extra gain.
cbp, mv0: Controls the selection of macroblocks.
Small speed cost for small quality gain.
qprd: adaptive quantization based on the
macroblock's complexity.
May help or hurt depending on the video and other options.
This can cause artifacts unless you set vqmax to some reasonably small value
(6 is good, maybe as low as 4); vqmin=1 should also help.
qns: very slow, especially when combined
with qprd.
This option will make the encoder minimize noise due to compression
artifacts instead of making the encoded video strictly match the source.
Do not use this unless you have already tweaked everything else as far as it
will go and the results still are not good enough.
vqcomp: Tweak ratecontrol.
What values are good depends on the movie.
You can safely leave this alone if you want.
Reducing vqcomp puts more bits on low-complexity scenes, increasing it puts
them on high-complexity scenes (default: 0.5, range: 0-1. recommended range:
0.5-0.7).
vlelim, vcelim: Sets the single coefficient
elimination threshold for luminance and chroma planes.
These are encoded separately in all MPEG-like algorithms.
The idea behind these options is to use some good heuristics to determine
when the change in a block is less than the threshold you specify, and in
such a case, to just encode the block as "no change".
This saves bits and perhaps speeds up encoding. vlelim=-4 and vcelim=9
seem to be good for live movies, but seem not to help with anime;
when encoding animation, you should probably leave them unchanged.
qpel: Quarter pixel motion estimation.
MPEG-4 uses half pixel precision for its motion search by default,
therefore this option comes with an overhead as more information will be
stored in the encoded file.
The compression gain/loss depends on the movie, but it is usually not very
effective on anime.
qpel always incurs a significant cost in CPU decode time (+25% in
practice).
psnr: does not affect the actual encoding,
but writes a log file giving the type/size/quality of each frame, and
prints a summary of PSNR (Peak Signal to Noise Ratio) at the end.
Options not recommended to play with:vme: The default is best.
lumi_mask, dark_mask: Psychovisual adaptive
quantization.
You do not want to play with those options if you care about quality.
Reasonable values may be effective in your case, but be warned this is very
subjective.
scplx_mask: Tries to prevent blocky
artifacts, but postprocessing is better.
Encoding setting examples
The following settings are examples of different encoding
option combinations that affect the speed vs quality tradeoff
at the same target bitrate.
All the encoding settings were tested on a 720x448 @30000/1001 fps
video sample, the target bitrate was 900kbps, and the machine was an
AMD-64 3400+ at 2400 Mhz in 64 bits mode.
Each encoding setting features the measured encoding speed (in
frames per second) and the PSNR loss (in dB) compared to the "very
high quality" setting.
Please understand that depending on your source, your machine type
and development advancements, you may get very different results.
DescriptionEncoding optionsspeed (in fps)Relative PSNR loss (in dB)Very high quality6fps0dBHigh quality15fps-0.5dBFast42fps-0.74dBRealtime54fps-1.21dBCustom inter/intra matrices
With this feature of
libavcodec
you are able to set custom inter (I-frames/keyframes) and intra
(P-frames/predicted frames) matrices. It is supported by many of the codecs:
mpeg1video and mpeg2video
are reported as working.
A typical usage of this feature is to set the matrices preferred by the
KVCD specifications.
The KVCD "Notch" Quantization Matrix:
Intra:
8 9 12 22 26 27 29 34
9 10 14 26 27 29 34 37
12 14 18 27 29 34 37 38
22 26 27 31 36 37 38 40
26 27 29 36 39 38 40 48
27 29 34 37 38 40 48 58
29 34 37 38 40 48 58 69
34 37 38 40 48 58 69 79
Inter:
16 18 20 22 24 26 28 30
18 20 22 24 26 28 30 32
20 22 24 26 28 30 32 34
22 24 26 30 32 32 34 36
24 26 28 32 34 34 36 38
26 28 30 32 34 36 38 40
28 30 32 34 36 38 42 42
30 32 34 36 38 40 42 44
Usage:
$ mencoder input.avi -o output.avi -oac copy -ovc lavc -lavcopts inter_matrix=...:intra_matrix=...
$ mencoder input.avi -ovc lavc -lavcopts
vcodec=mpeg2video:intra_matrix=8,9,12,22,26,27,29,34,9,10,14,26,27,29,34,37,
12,14,18,27,29,34,37,38,22,26,27,31,36,37,38,40,26,27,29,36,39,38,40,48,27,
29,34,37,38,40,48,58,29,34,37,38,40,48,58,69,34,37,38,40,48,58,69,79
:inter_matrix=16,18,20,22,24,26,28,30,18,20,22,24,26,28,30,32,20,22,24,26,
28,30,32,34,22,24,26,30,32,32,34,36,24,26,28,32,34,34,36,38,26,28,30,32,34,
36,38,40,28,30,32,34,36,38,42,42,30,32,34,36,38,40,42,44 -oac copy -o svcd.mpg
Example
So, you have just bought your shiny new copy of Harry Potter and the Chamber
of Secrets (widescreen edition, of course), and you want to rip this DVD
so that you can add it to your Home Theatre PC. This is a region 1 DVD,
so it is NTSC. The example below will still apply to PAL, except you will
omit (because the output framerate is the
same as the input framerate), and of course the crop dimensions will be
different.
After running , we follow the process
detailed in the section How to deal
with telecine and interlacing in NTSC DVDs and discover that it is
24000/1001 fps progressive video, which means that we need not use an inverse
telecine filter, such as or
.
Next, we want to determine the appropriate crop rectangle, so we use the
cropdetect filter:
mplayer dvd://1 -vf cropdetect
Make sure you seek to a fully filled frame (such as a bright scene), and
you will see in MPlayer's console output:
crop area: X: 0..719 Y: 57..419 (-vf crop=720:362:0:58)
We then play the movie back with this filter to test its correctness:
mplayer dvd://1 -vf crop=720:362:0:58
And we see that it looks perfectly fine. Next, we ensure the width and
height are a multiple of 16. The width is fine, however the height is
not. Since we did not fail 7th grade math, we know that the nearest
multiple of 16 lower than 362 is 352.
We could just use , but it would be nice
to take a little off the top and a little off the bottom so that we
retain the center. We have shrunk the height by 10 pixels, but we do not
want to increase the y-offset by 5-pixels since that is an odd number and
will adversely affect quality. Instead, we will increase the y-offset by
4 pixels:
mplayer dvd://1 -vf crop=720:352:0:62
Another reason to shave pixels from both the top and the bottom is that we
ensure we have eliminated any half-black pixels if they exist. Note that if
your video is telecined, make sure the filter (or
whichever inverse telecine filter you decide to use) appears in the filter
chain before you crop. If it is interlaced, deinterlace before cropping.
(If you choose to preserve the interlaced video, then make sure your
vertical crop offset is a multiple of 4.)
If you are really concerned about losing those 10 pixels, you might
prefer instead to scale the dimensions down to the nearest multiple of 16.
The filter chain would look like:
-vf crop=720:362:0:58,scale=720:352
Scaling the video down like this will mean that some small amount of
detail is lost, though it probably will not be perceptible. Scaling up will
result in lower quality (unless you increase the bitrate). Cropping
discards those pixels altogether. It is a tradeoff that you will want to
consider for each circumstance. For example, if the DVD video was made
for television, you might want to avoid vertical scaling, since the line
sampling corresponds to the way the content was originally recorded.
On inspection, we see that our movie has a fair bit of action and high
amounts of detail, so we pick 2400Kbit for our bitrate.
We are now ready to do the two pass encode. Pass one:
mencoder dvd://1 -ofps 24000/1001 -oac copy -vf crop=720:352:0:62,hqdn3d=2:1:2 -ovc lavc \
-lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:mbcmp=3:autoaspect:vpass=1 \
-o Harry_Potter_2.avi
And pass two is the same, except that we specify :
mencoder dvd://1 -ofps 24000/1001 -oac copy -vf crop=720:352:0:62,hqdn3d=2:1:2 -ovc lavc \
-lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:mbcmp=3:autoaspect:vpass=2 \
-o Harry_Potter_2.avi
The options will greatly increase the
quality at the expense of encoding time. There is little reason to leave
these options out when the primary goal is quality. The options
select a comparison function that
yields higher quality than the defaults. You might try experimenting with
this parameter (refer to the man page for the possible values) as
different functions can have a large impact on quality depending on the
source material. For example, if you find
libavcodec produces too much
blocky artifacting, you could try selecting the experimental NSSE as
comparison function via .
For this movie, the resulting AVI will be 138 minutes long and nearly
3GB. And because you said that file size does not matter, this is a
perfectly acceptable size. However, if you had wanted it smaller, you
could try a lower bitrate. Increasing bitrates have diminishing
returns, so while we might clearly see an improvement from 1800Kbit to
2000Kbit, it might not be so noticeable above 2000Kbit. Feel
free to experiment until you are happy.
Because we passed the source video through a denoise filter, you may want
to add some of it back during playback. This, along with the
post-processing filter, drastically improves the
perception of quality and helps eliminate blocky artifacts in the video.
With MPlayer's option,
you can vary the amount of post-processing done by the spp filter
depending on available CPU. Also, at this point, you may want to apply
gamma and/or color correction to best suit your display. For example:
mplayer Harry_Potter_2.avi -vf spp,noise=9ah:5ah,eq2=1.2 -autoq 3Encoding with the XviD
codecXviD is a free library for
encoding MPEG-4 ASP video streams.
Before starting to encode, you need to
set up MEncoder to support it.
This guide mainly aims at featuring the same kind of information
as x264's encoding guide.
Therefore, please begin by reading
the first part
of that guide.
What options should I use to get the best results?
Please begin by reviewing the
XviD section of
MPlayer's man page.
This section is intended to be a supplement to the man page.
The XviD default settings are already a good tradeoff between
speed and quality, therefore you can safely stick to them if
the following section puzzles you.
Encoding options of XviDvhq
This setting affects the macroblock decision algorithm, where the
higher the setting, the wiser the decision.
The default setting may be safely used for every encode, while
higher settings always help PSNR but are significantly slower.
Please note that a better PSNR does not necessarily mean
that the picture will look better, but tells you that it is
closer to the original.
Turning it off will noticeably speed up encoding; if speed is
critical for you, the tradeoff may be worth it.
bvhq
This does the same job as vhq, but does it on B-frames.
It has a negligible impact on speed, and slightly improves quality
(around +0.1dB PSNR).
max_bframes
A higher number of consecutive allowed B-frames usually improves
compressibility, although it may also lead to more blocking artifacts.
The default setting is a good tradeoff between compressibility and
quality, but you may increase it up to 3 if you are bitrate-starved.
You may also decrease it to 1 or 0 if you are aiming at perfect
quality, though in that case you should make sure your
target bitrate is high enough to ensure that the encoder does not
have to increase quantizers to reach it.
bf_threshold
This controls the B-frame sensitivity of the encoder, where a higher
value leads to more B-frames being used (and vice versa).
This setting is to be used together with ;
if you are bitrate-starved, you should increase both
and ,
while you may increase and reduce
so that the encoder may use more
B-frames in places that only really
need them.
A low number of and a high value of
is probably not a wise choice as it
will force the encoder to put B-frames in places that would not
benefit from them, therefore reducing visual quality.
However, if you need to be compatible with standalone players that
only support old DivX profiles (which only supports up to 1
consecutive B-frame), this would be your only way to
increase compressibility through using B-frames.
trellis
Optimizes the quantization process to get an optimal tradeoff
between PSNR and bitrate, which allows significant bit saving.
These bits will in return be spent elsewhere on the video,
raising overall visual quality.
You should always leave it on as its impact on quality is huge.
Even if you are looking for speed, do not disable it until you
have turned down and all other more
CPU-hungry options to the minimum.
hq_ac
Activates a better coefficient cost estimation method, which slightly
reduces filesize by around 0.15 to 0.19% (which corresponds to less
than 0.01dB PSNR increase), while having a negligible impact on speed.
It is therefore recommended to always leave it on.
cartoon
Designed to better encode cartoon content, and has no impact on
speed as it just tunes the mode decision heuristics for this type
of content.
me_quality
This setting is to control the precision of the motion estimation.
The higher , the more
precise the estimation of the original motion will be, and the
better the resulting clip will capture the original motion.
The default setting is best in all cases;
thus it is not recommended to turn it down unless you are
really looking for speed, as all the bits saved by a good motion
estimation would be spent elsewhere, raising overall quality.
Therefore, do not go any lower than 5, and even that only as a last
resort.
chroma_me
Improves motion estimation by also taking the chroma (color)
information into account, whereas
alone only uses luma (grayscale).
This slows down encoding by 5-10% but improves visual quality
quite a bit by reducing blocking effects and reduces filesize by
around 1.3%.
If you are looking for speed, you should disable this option before
starting to consider reducing .
chroma_opt
Is intended to increase chroma image quality around pure
white/black edges, rather than improving compression.
This can help to reduce the "red stairs" effect.
lumi_mask
Tries to give less bitrate to part of the picture that the
human eye cannot see very well, which should allow the encoder
to spend the saved bits on more important parts of the picture.
The quality of the encode yielded by this option highly depends
on personal preferences and on the type and monitor settings
used to watch it (typically, it will not look as good if it is
bright or if it is a TFT monitor).
qpel
Raise the number of candidate motion vectors by increasing
the precision of the motion estimation from halfpel to
quarterpel.
The idea is to find better motion vectors which will in return
reduce bitrate (hence increasing quality).
However, motion vectors with quarterpel precision require a
few extra bits to code, but the candidate vectors do not always
give (much) better results.
Quite often, the codec still spends bits on the extra precision,
but little or no extra quality is gained in return.
Unfortunately, there is no way to foresee the possible gains of
, so you need to actually encode with and
without it to know for sure.
can be almost double encoding time, and
requires as much as 25% more processing power to decode.
It is not supported by all standalone players.
gmc
Tries to save bits on panning scenes by using a single motion
vector for the whole frame.
This almost always raises PSNR, but significantly slows down
encoding (as well as decoding).
Therefore, you should only use it when you have turned
to the maximum.
XviD's GMC is more
sophisticated than DivX's, but is only supported by few
standalone players.
Encoding profiles
XviD supports encoding profiles through the option,
which are used to impose restrictions on the properties of the XviD video
stream such that it will be playable on anything which supports the
chosen profile.
The restrictions relate to resolutions, bitrates and certain MPEG-4
features.
The following table shows what each profile supports.
SimpleAdvanced SimpleDivXProfile name0123012345HandheldPortable NTSCPortable PALHome Theater NTSCHome Theater PALHDTVWidth [pixels]1761763523521761763523523527201763523527207201280Height [pixels]144144288288144144288288576576144240288480576720Frame rate [fps]15151515303015303030153025302530Max average bitrate [kbps]646412838412812838476830008000537.648544854485448549708.4Peak average bitrate over 3 secs [kbps]800800080008000800016000Max. B-frames0000011112MPEG quantizationXXXXXXAdaptive quantizationXXXXXXXXXXXXInterlaced encodingXXXXXXXXXQuaterpixelXXXXXXGlobal motion compensationXXXXXXEncoding setting examples
The following settings are examples of different encoding
option combinations that affect the speed vs quality tradeoff
at the same target bitrate.
All the encoding settings were tested on a 720x448 @30000/1001 fps
video sample, the target bitrate was 900kbps, and the machine was an
AMD-64 3400+ at 2400 Mhz in 64 bits mode.
Each encoding setting features the measured encoding speed (in
frames per second) and the PSNR loss (in dB) compared to the "very
high quality" setting.
Please understand that depending on your source, your machine type
and development advancements, you may get very different results.
DescriptionEncoding optionsspeed (in fps)Relative PSNR loss (in dB)Very high quality16fps0dBHigh quality18fps-0.1dBFast28fps-0.69dBRealtime38fps-1.48dBEncoding with the x264 codecx264 is a free library for
encoding H.264/AVC video streams.
Before starting to encode, you need to
set up MEncoder to support it.
Encoding options of x264
Please begin by reviewing the
x264 section of
MPlayer's man page.
This section is intended to be a supplement to the man page.
Here you will find quick hints about which options are most
likely to interest most people. The man page is more terse,
but also more exhaustive, and it sometimes offers much better
technical detail.
IntroductionThis guide considers two major categories of encoding options:Options which mainly trade off encoding time vs. quality
Options which may be useful for fulfilling various personal
preferences and special requirements
Ultimately, only you can decide which options are best for your
purposes. The decision for the first class of options is the simplest:
you only have to decide whether you think the quality differences
justify the speed differences. For the second class of options,
preferences may be far more subjective, and more factors may be
involved. Note that some of the "personal preferences and special
requirements" options can still have large impacts on speed or quality,
but that is not what they are primarily useful for. A couple of the
"personal preference" options may even cause changes that look better
to some people, but look worse to others.
Before continuing, you need to understand that this guide uses only one
quality metric: global PSNR.
For a brief explanation of what PSNR is, see
the Wikipedia article on PSNR.
Global PSNR is the last PSNR number reported when you include
the option in .
Any time you read a claim about PSNR, one of the assumptions
behind the claim is that equal bitrates are used.
Nearly all of this guide's comments assume you are using
two pass.
When comparing options, there are two major reasons for using
two pass encoding.
First, using two pass often gains around 1dB PSNR, which is a
very big difference.
Secondly, testing options by doing direct quality comparisons
with one pass encodes introduces a major confounding
factor: bitrate often varies significantly with each encode.
It is not always easy to tell whether quality changes are due
mainly to changed options, or if they mostly reflect essentially
random differences in the achieved bitrate.
Options which primarily affect speed and qualitysubq:
Of the options which allow you to trade off speed for quality,
and (see below) are usually
by far the most important.
If you are interested in tweaking either speed or quality, these
are the first options you should consider.
On the speed dimension, the and
options interact with each other fairly
strongly.
Experience shows that, with one reference frame,
(the default setting) takes about 35% more time than
.
With 6 reference frames, the penalty grows to over 60%.
's effect on PSNR seems fairly constant
regardless of the number of reference frames.
Typically, achieves 0.2-0.5 dB higher global
PSNR in comparison .
This is usually enough to be visible.
is the slowest, highest quality mode.
In comparison to , it usually gains 0.1-0.4 dB
global PSNR with speed costs varying from 25%-100%.
Unlike other levels of , the behavior of
does not depend much on
and . Instead, the effectiveness of depends mostly upon the number of B-frames used. In normal
usage, this means has a large impact on both speed
and quality in complex, high motion scenes, but it may not have much effect
in low-motion scenes. Note that it is still recommended to always set
to something other than zero (see below).
frameref:
is set to 1 by default, but this
should not be taken to imply that it is reasonable to set it
to 1.
Merely raising to 2 gains around
0.15dB PSNR with a 5-10% speed penalty; this seems like a
good tradeoff.
gains around 0.25dB PSNR over
, which should be a visible
difference.
is around 15% slower than
.
Unfortunately, diminishing returns set in rapidly.
can be expected to gain only
0.05-0.1 dB over at an additional
15% speed penalty.
Above , the quality gains are
usually very small (although you should keep in mind throughout
this whole discussion that it can vary quite a lot depending on
your source).
In a fairly typical case,
will improve global PSNR by a tiny 0.02dB over
, at a speed cost of 15%-20%.
At such high values, the only really
good thing that can be said is that increasing it even further will
almost certainly never harm
PSNR, but the additional quality benefits are barely even
measurable, let alone perceptible.
Note:
Raising to unnecessarily high values
can and
usually does
hurt coding efficiency if you turn CABAC off.
With CABAC on (the default behavior), the possibility of setting
"too high" currently seems too remote
to even worry about, and in the future, optimizations may remove
the possibility altogether.
If you care about speed, a reasonable compromise is to use low
and values on
the first pass, and then raise them on the second pass.
Typically, this has a negligible negative effect on the final
quality: You will probably lose well under 0.1dB PSNR, which
should be much too small of a difference to see.
However, different values of can
occasionally affect frametype decision.
Most likely, these are rare outlying cases, but if you want to
be pretty sure, consider whether your video has either
fullscreen repetitive flashing patterns or very large temporary
occlusions which might force an I-frame.
Adjust the first-pass so it is large
enough to contain the duration of the flashing cycle (or occlusion).
For example, if the scene flashes back and forth between two images
over a duration of three frames, set the first pass
to 3 or higher.
This issue is probably extremely rare in live action video material,
but it does sometimes come up in video game captures.
me:
This option is for choosing the motion estimation search method.
Altering this option provides a straightforward quality-vs-speed
tradeoff. is only a few percent faster than
the default search, at a cost of under 0.1dB global PSNR. The
default setting () is a reasonable tradeoff
between speed and quality. gains a little under
0.1dB global PSNR, with a speed penalty that varies depending on
. At high values of
(e.g. 12 or so),
is about 40% slower than the default . With
, the speed penalty incurred drops to
25%-30%.
uses an exhaustive search that is too slow for
practical use.
4x4mv:
This option enables the use of 8x4, 4x8 and 4x4 subpartitions in
predicted macroblocks. Enabling it results in a fairly consistent
10%-15% loss of speed. This option is rather useless in source
containing only low motion, however in some high-motion source,
particularly source with lots of small moving objects, gains of
about 0.1dB can be expected.
bframes:
If you are used to encoding with other codecs, you may have found
that B-frames are not always useful.
In H.264, this has changed: there are new techniques and block
types that are possible in B-frames.
Usually, even a naive B-frame choice algorithm can have a
significant PSNR benefit.
It is interesting to note that using B-frames usually speeds up
the second pass somewhat, and may also speed up a single
pass encode if adaptive B-frame decision is turned off.
With adaptive B-frame decision turned off
('s ),
the optimal value for this setting is usually no more than
, or else high-motion scenes can suffer.
With adaptive B-frame decision on (the default behavior), it is
safe to use higher values; the encoder will reduce the use of
B-frames in scenes where they would hurt compression.
The encoder rarely chooses to use more than 3 or 4 B-frames;
setting this option any higher will have little effect.
b_adapt:
Note: This is on by default.
With this option enabled, the encoder will use a reasonably fast
decision process to reduce the number of B-frames used in scenes that
might not benefit from them as much.
You can use to tweak how B-frame-happy
the encoder is.
The speed penalty of adaptive B-frames is currently rather modest,
but so is the potential quality gain.
It usually does not hurt, however.
Note that this only affects speed and frametype decision on the
first pass.
and have no
effect on subsequent passes.
b_pyramid:
You might as well enable this option if you are using >=2 B-frames;
as the man page says, you get a little quality improvement at no
speed cost.
Note that these videos cannot be read by libavcodec-based decoders
older than about March 5, 2005.
weight_b:
In typical cases, there is not much gain with this option.
However, in crossfades or fade-to-black scenes, weighted
prediction gives rather large bitrate savings.
In MPEG-4 ASP, a fade-to-black is usually best coded as a series
of expensive I-frames; using weighted prediction in B-frames
makes it possible to turn at least some of these into much smaller
B-frames.
Encoding time cost is minimal, as no extra decisions need to be made.
Also, contrary to what some people seem to guess, the decoder
CPU requirements are not much affected by weighted prediction,
all else being equal.
Unfortunately, the current adaptive B-frame decision algorithm
has a strong tendency to avoid B-frames during fades.
Until this changes, it may be a good idea to add
to your x264encopts, if you expect
fades to have a large effect in your particular video
clip.
Options pertaining to miscellaneous preferencesTwo pass encoding:
Above, it was suggested to always use two pass encoding, but there
are still reasons for not using it. For instance, if you are capturing
live TV and encoding in realtime, you are forced to use single-pass.
Also, one pass is obviously faster than two passes; if you use the
exact same set of options on both passes, two pass encoding is almost
twice as slow.
Still, there are very good reasons for using two pass encoding. For
one thing, single pass ratecontrol is not psychic, and it often makes
unreasonable choices because it cannot see the big picture. For example,
suppose you have a two minute long video consisting of two distinct
halves. The first half is a very high-motion scene lasting 60 seconds
which, in isolation, requires about 2500kbps in order to look decent.
Immediately following it is a much less demanding 60-second scene
that looks good at 300kbps. Suppose you ask for 1400kbps on the theory
that this is enough to accomodate both scenes. Single pass ratecontrol
will make a couple of "mistakes" in such a case. First of all, it
will target 1400kbps in both segments. The first segment may end up
heavily overquantized, causing it to look unacceptably and unreasonably
blocky. The second segment will be heavily underquantized; it may look
perfect, but the bitrate cost of that perfection will be completely
unreasonable. What is even harder to avoid is the problem at the
transition between the two scenes. The first seconds of the low motion
half will be hugely over-quantized, because the ratecontrol is still
expecting the kind of bitrate requirements it met in the first half
of the video. This "error period" of heavily over-quantized low motion
will look jarringly bad, and will actually use less than the 300kbps
it would have taken to make it look decent. There are ways to
mitigate the pitfalls of single-pass encoding, but they may tend to
increase bitrate misprediction.
Multipass ratecontrol can offer huge advantages over a single pass.
Using the statistics gathered from the first pass encode, the encoder
can estimate, with reasonable accuracy, the "cost" (in bits) of
encoding any given frame, at any given quantizer. This allows for
a much more rational, better planned allocation of bits between the
expensive (high-motion) and cheap (low-motion) scenes. See
below for some ideas on how to tweak this
allocation to your liking.
Moreover, two passes need not take twice as long as one pass. You can
tweak the options in the first pass for higher speed and lower quality.
If you choose your options well, you can get a very fast first pass.
The resulting quality in the second pass will be slightly lower because size
prediction is less accurate, but the quality difference is normally much
too small to be visible. Try, for example, adding
to the first pass
. Then, on the second pass, use slower,
higher-quality options:
Three pass encoding?
x264 offers the ability to make an arbitrary number of consecutive
passes. If you specify on the first pass,
then use on a subsequent pass, the subsequent
pass will both read the statistics from the previous pass, and write
its own statistics. An additional pass following this one will have
a very good base from which to make highly accurate predictions of
framesizes at a chosen quantizer. In practice, the overall quality
gain from this is usually close to zero, and quite possibly a third
pass will result in slightly worse global PSNR than the pass before
it. In typical usage, three passes help if you get either bad bitrate
prediction or bad looking scene transitions when using only two passes.
This is somewhat likely to happen on extremely short clips. There are
also a few special cases in which three (or more) passes are handy
for advanced users, but for brevity, this guide omits discussing those
special cases.
qcomp:
trades off the number of bits allocated
to "expensive" high-motion versus "cheap" low-motion frames. At
one extreme, aims for true constant
bitrate. Typically this would make high-motion scenes look completely
awful, while low-motion scenes would probably look absolutely
perfect, but would also use many times more bitrate than they
would need in order to look merely excellent. At the other extreme,
achieves nearly constant quantization parameter
(QP). Constant QP does not look bad, but most people think it is more
reasonable to shave some bitrate off of the extremely expensive scenes
(where the loss of quality is not as noticeable) and reallocate it to
the scenes that are easier to encode at excellent quality.
is set to 0.6 by default, which may be slightly
low for many peoples' taste (0.7-0.8 are also commonly used).
keyint:
is solely for trading off file seekability against
coding efficiency. By default, is set to 250. In
25fps material, this guarantees the ability to seek to within 10 seconds
precision. If you think it would be important and useful to be able to
seek within 5 seconds of precision, set ;
this will hurt quality/bitrate slightly. If you care only about quality
and not about seekability, you can set it to much higher values
(understanding that there are diminishing returns which may become
vanishingly low, or even zero). The video stream will still have seekable
points as long as there are some scene changes.
deblockalpha, deblockbeta:
This topic is going to be a bit controversial.
H.264 defines a simple deblocking procedure on I-blocks that uses
pre-set strengths and thresholds depending on the QP of the block
in question.
By default, high QP blocks are filtered heavily, and low QP blocks
are not deblocked at all.
The pre-set strengths defined by the standard are well-chosen and
the odds are very good that they are PSNR-optimal for whatever
video you are trying to encode.
The and
parameters allow you to specify offsets to the preset deblocking
thresholds.
Many people seem to think it is a good idea to lower the deblocking
filter strength by large amounts (say, -3).
This is however almost never a good idea, and in most cases,
people who are doing this do not understand very well how
deblocking works by default.
The first and most important thing to know about the in-loop
deblocking filter is that the default thresholds are almost always
PSNR-optimal.
In the rare cases that they are not optimal, the ideal offset is
plus or minus 1.
Adjusting deblocking parameters by a larger amount is almost
guaranteed to hurt PSNR.
Strengthening the filter will smear more details; weakening the
filter will increase the appearance of blockiness.
It is definitely a bad idea to lower the deblocking thresholds if
your source is mainly low in spacial complexity (i.e., not a lot
of detail or noise).
The in-loop filter does a rather excellent job of concealing
the artifacts that occur.
If the source is high in spacial complexity, however, artifacts
are less noticeable.
This is because the ringing tends to look like detail or noise.
Human visual perception easily notices when detail is removed,
but it does not so easily notice when the noise is wrongly
represented.
When it comes to subjective quality, noise and detail are somewhat
interchangeable.
By lowering the deblocking filter strength, you are most likely
increasing error by adding ringing artifacts, but the eye does
not notice because it confuses the artifacts with detail.
This still does not justify
lowering the deblocking filter strength, however.
You can generally get better quality noise from postprocessing.
If your H.264 encodes look too blurry or smeared, try playing with
when you play your encoded movie.
should conceal most mild
artifacting.
It will almost certainly look better than the results you
would have gotten just by fiddling with the deblocking filter.
Encoding setting examples
The following settings are examples of different encoding
option combinations that affect the speed vs quality tradeoff
at the same target bitrate.
All the encoding settings were tested on a 720x448 @30000/1001 fps
video sample, the target bitrate was 900kbps, and the machine was an
AMD-64 3400+ at 2400 Mhz in 64 bits mode.
Each encoding setting features the measured encoding speed (in
frames per second) and the PSNR loss (in dB) compared to the "very
high quality" setting.
Please understand that depending on your source, your machine type
and development advancements, you may get very different results.
DescriptionEncoding optionsspeed (in fps)Relative PSNR loss (in dB)Very high quality6fps0dBHigh quality13fps-0.89dBFast17fps-1.48dBUsing MEncoder to create VCD/SVCD/DVD-compliant files.Format ConstraintsMEncoder is capable of creating VCD, SCVD
and DVD format MPEG files using the
libavcodec library.
These files can then be used in conjunction with
vcdimager
or
dvdauthor
to create discs that will play on a standard set-top player.
The DVD, SVCD, and VCD formats are subject to heavy constraints.
Only a small selection of encoded picture sizes and aspect ratios are
available.
If your movie does not already meet these requirements, you may have
to scale,crop or add black borders to the picture to make it
compliant.
Format ConstraintsFormatResolutionV. CodecV. BitrateSample RateA. CodecA. BitrateFPSAspectNTSC DVD720x480, 704x480, 352x480, 352x240MPEG-29800 kbps48000 HzAC3,PCM1536 kbps (max)30000/1001, 24000/10014:3, 16:9 (only for 720x480)NTSC DVD352x240
These resolutions are rarely used for DVDs because
they are fairly low quality.MPEG-11856 kbps48000 HzAC3,PCM1536 kbps (max)30000/1001, 24000/10014:3, 16:9NTSC SVCD480x480MPEG-22600 kbps44100 HzMP2384 kbps (max)30000/10014:3NTSC VCD352x240MPEG-11150 kbps44100 HzMP2224 kbps24000/1001, 30000/10014:3PAL DVD720x576, 704x576, 352x576, 352x288MPEG-29800 kbps48000 HzMP2,AC3,PCM1536 kbps (max)254:3, 16:9 (only for 720x576)PAL DVD352x288MPEG-11856 kbps48000 HzMP2,AC3,PCM1536 kbps (max)254:3, 16:9PAL SVCD480x576MPEG-22600 kbps44100 HzMP2384 kbps (max)254:3PAL VCD352x288MPEG-11152 kbps44100 HzMP2224 kbps254:3
If your movie has 2.35:1 aspect (most recent action movies), you will
have to add black borders or crop the movie down to 16:9 to make a DVD
or VCD.
If you add black borders, try to align them at 16-pixel boundaries in
order to minimize the impact on encoding performance.
Thankfully DVD has sufficiently excessive bitrate that you do not have
to worry too much about encoding efficiency, but SVCD and VCD are
highly bitrate-starved and require effort to obtain acceptable quality.
GOP Size Constraints
DVD, VCD, and SVCD also constrain you to relatively low
GOP (Group of Pictures) sizes.
For 30 fps material the largest allowed GOP size is 18.
For 25 or 24 fps, the maximum is 15.
The GOP size is set using the option.
Bitrate Constraints
VCD video is required to be CBR at 1152 kbps.
This highly limiting constraint also comes along with an extremly low vbv
buffer size of 327 kilobits.
SVCD allows varying video bitrates up to 2500 kbps, and a somewhat less
restrictive vbv buffer size of 917 kilobits is allowed.
DVD video bitrates may range anywhere up to 9800 kbps (though typical
bitrates are about half that), and the vbv buffer size is 1835 kilobits.
Output OptionsMEncoder has options to control the output
format.
Using these options we can instruct it to create the correct type of
file.
The options for VCD and SVCD are called xvcd and xsvcd, because they
are extended formats.
They are not strictly compliant, mainly because the output does not
contain scan offsets.
If you need to generate an SVCD image, you should pass the output file
to
vcdimager.
VCD:
-of mpeg -mpegopts format=xvcd
SVCD:
-of mpeg -mpegopts format=xsvcd
DVD:
-of mpeg -mpegopts format=dvd
DVD with NTSC Pullup:
-of mpeg -mpegopts format=dvd:telecine -ofps 24000/1001
This allows 24000/1001 fps progressive content to be encoded at 30000/1001
fps whilst maintaing DVD-compliance.
Aspect Ratio
The aspect argument of is used to encode
the aspect ratio of the file.
During playback the aspect ratio is used to restore the video to the
correct size.
16:9 or "Widescreen"
-lavcopts aspect=16/9
4:3 or "Fullscreen"
-lavcopts aspect=4/3
2.35:1 or "Cinemascope" NTSC
-vf scale=720:368,expand=720:480 -lavcopts aspect=16/9
To calculate the correct scaling size, use the expanded NTSC width of
854/2.35 = 368
2.35:1 or "Cinemascope" PAL
-vf scale="720:432,expand=720:576 -lavcopts aspect=16/9
To calculate the correct scaling size, use the expanded PAL width of
1024/2.35 = 432
Sample Rate Conversion
If the audio sample rate in the original file is not the same as
required by the target format, sample rate conversion is required.
This is achieved using the option and
the audio filter together.
DVD:
-srate 48000 -af lavcresample=48000
VCD and SVCD:
-srate 44100 -af lavcresample=44100
Using libavcodec for VCD/SVCD/DVD EncodingIntroductionlibavcodec can be used to
create VCD/SVCD/DVD compliant video by using the appropriate options.
lavcopts
This is a list of fields in that you may
be required to change in order to make a complaint movie for VCD, SVCD,
or DVD:
acodec:
for VCD, SVCD, or PAL DVD;
is most commonly used for DVD.
PCM audio may also be used for DVD, but this is mostly a big waste of
space.
Note that MP3 audio is not compliant for any of these formats, but
players often have no problem playing it anyway.
abitrate:
224 for VCD; up to 384 for SVCD; up to 1536 for DVD, but commonly
used values range from 192 kbps for stereo to 384 kbps for 5.1 channel
sound.
vcodec:
for VCD;
for SVCD;
is usually used for DVD but you may also use
for CIF resolutions.
keyint:
Used to set the GOP size.
18 for 30fps material, or 15 for 25/24 fps material.
Commercial producers seem to prefer keyframe intervals of 12.
It is possible to make this much larger and still retain compatibility
with most players.
A of 25 should never cause any problems.
vrc_buf_size:
327 for VCD, 917 for SVCD, and 1835 for DVD.
vrc_minrate:
1152, for VCD. May be left alone for SVCD and DVD.
vrc_maxrate:
1152 for VCD; 2500 for SVCD; 9800 for DVD.
For SVCD and DVD, you might wish to use lower values depending on your
own personal preferences and requirements.
vbitrate:
1152 for VCD;
up to 2500 for SVCD;
up to 9800 for DVD.
For the latter two formats, vbitrate should be set based on personal
preference.
For instance, if you insist on fitting 20 or so hours on a DVD, you
could use vbitrate=400.
The resulting video quality would probably be quite bad.
If you are trying to squeeze out the maximum possible quality on a DVD,
use vbitrate=9800, but be warned that this could constrain you to less
than an hour of video on a single-layer DVD.
Examples
This is a typical minimum set of for
encoding video:
VCD:
-lavcopts vcodec=mpeg1video:vrc_buf_size=327:vrc_minrate=1152:\
vrc_maxrate=1152:vbitrate=1152:keyint=15:acodec=mp2
SVCD:
-lavcopts vcodec=mpeg2video:vrc_buf_size=917:vrc_maxrate=2500:vbitrate=1800:\
keyint=15:acodec=mp2
DVD:
-lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:\
keyint=15:acodec=ac3
Advanced Options
For higher quality encoding, you may also wish to add quality-enhancing
options to lavcopts, such as ,
, and others.
Note that and , while often
useful with MPEG-4, are not usable with MPEG-1 or MPEG-2.
Also, if you are trying to make a very high quality DVD encode, it may
be useful to add to lavcopts.
Doing so may help reduce the appearance of blocks in flat-colored areas.
Putting it all together, this is an example of a set of lavcopts for a
higher quality DVD:
-lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=8000:\
keyint=15:trell:mbd=2:precmp=2:subcmp=2:cmp=2:dia=-10:predia=-10:cbp:mv0:\
vqmin=1:lmin=1:dc=10
Encoding Audio
VCD and SVCD support MPEG-1 layer II audio, using one of
toolame,
twolame,
or libavcodec's MP2 encoder.
The libavcodec MP2 is far from being as good as the other two libraries,
however it should always be available to use.
VCD only supports constant bitrate audio (CBR) whereas SVCD supports
variable bitrate (VBR), too.
Be careful when using VBR because some bad standalone players might not
support it too well.
For DVD audio, libavcodec's
AC3 codec is used.
toolame
For VCD and SVCD:
-oac toolame -toolameopts br=224
twolame
For VCD and SVCD:
-oac twolame -twolameopts br=224
libavcodec
For DVD with 2 channel sound:
-oac lavc -lavcopts acodec=ac3:abitrate=192
For DVD with 5.1 channel sound:
-channels 6 -oac lavc -lavcopts acodec=ac3:abitrate=384
For VCD and SVCD:
-oac lavc -lavcopts acodec=mp2:abitrate=224
Putting it all Together
This section shows some complete commands for creating VCD/SVCD/DVD
compliant videos.
PAL DVD
mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=dvd -vf scale=720:576,\
harddup -srate 48000 -af lavcresample=48000 -lavcopts vcodec=mpeg2video:\
vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:keyint=15:acodec=ac3:\
abitrate=192:aspect=16/9 -ofps 25 \
-o movie.mpgmovie.aviNTSC DVD
mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=dvd -vf scale=720:480,\
harddup -srate 48000 -af lavcresample=48000 -lavcopts vcodec=mpeg2video:\
vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=5000:keyint=18:acodec=ac3:\
abitrate=192:aspect=16/9 -ofps 30000/1001 \
-o movie.mpgmovie.aviPAL AVI Containing AC3 Audio to DVD
If the source already has AC3 audio, use -oac copy instead of re-encoding it.
mencoder -oac copy -ovc lavc -of mpeg -mpegopts format=dvd -vf scale=720:576,\
harddup -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:\
vbitrate=5000:keyint=15:aspect=16/9 -ofps 25 \
-o movie.mpgmovie.aviNTSC AVI Containing AC3 Audio to DVD
If the source already has AC3 audio, and is NTSC @ 24000/1001 fps:
mencoder -oac copy -ovc lavc -of mpeg -mpegopts format=dvd:telecine \
-vf scale=720:480,harddup -lavcopts vcodec=mpeg2video:vrc_buf_size=1835:\
vrc_maxrate=9800:vbitrate=5000:keyint=15:aspect=16/9 -ofps 24000/1001 \
-o movie.mpgmovie.aviPAL SVCD
mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xsvcd -vf \
scale=480:576,harddup -srate 44100 -af lavcresample=44100 -lavcopts \
vcodec=mpeg2video:mbd=2:keyint=15:vrc_buf_size=917:vrc_minrate=600:\
vbitrate=2500:vrc_maxrate=2500:acodec=mp2:abitrate=224 -ofps 25 \
-o movie.mpgmovie.aviNTSC SVCD
mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xsvcd -vf \
scale=480:480,harddup -srate 44100 -af lavcresample=44100 -lavcopts \
vcodec=mpeg2video:mbd=2:keyint=18:vrc_buf_size=917:vrc_minrate=600:\
vbitrate=2500:vrc_maxrate=2500:acodec=mp2:abitrate=224 -ofps 30000/1001 \
-o movie.mpgmovie.aviPAL VCD
mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xvcd -vf \
scale=352:288,harddup -srate 44100 -af lavcresample=44100 -lavcopts \
vcodec=mpeg1video:keyint=15:vrc_buf_size=327:vrc_minrate=1152:vbitrate=1152:\
vrc_maxrate=1152:acodec=mp2:abitrate=224 -ofps 25 \
-o movie.mpgmovie.aviNTSC VCD
mencoder -oac lavc -ovc lavc -of mpeg -mpegopts format=xvcd -vf \
scale=352:240,harddup -srate 44100 -af lavcresample=44100 -lavcopts \
vcodec=mpeg1video:keyint=18:vrc_buf_size=327:vrc_minrate=1152:vbitrate=1152:\
vrc_maxrate=1152:acodec=mp2:abitrate=224 -ofps 30000/1001 \
-o movie.mpgmovie.avi