Encoding with <application>MEncoder</application> For the complete list of available MEncoder options and examples, please see the man page. For a series of hands-on examples and detailed guides on using several encoding parameters, read the encoding-tips that were collected from several mailing list threads on MPlayer-users. Search the archives for a wealth of discussions about all aspects of and problems related to encoding with MEncoder. Encoding two pass MPEG-4 ("DivX") The name comes from the fact that this method encodes the file twice. The first encoding (dubbed pass) creates some temporary files (*.log) with a size of few megabytes, do not delete them yet (you can delete the AVI). In the second pass, the two pass output file is created, using the bitrate data from the temporary files. The resulting file will have much better image quality. If this is the first time you heard about this, you should consult some guides available on the net. copy audio track Two pass encode of a DVD to an MPEG-4 ("DivX") AVI while copying the audio track. mencoder dvd://2 -ovc lavc -lavcopts vcodec=mpeg4:vpass=1 -oac copy -o movie.avi mencoder dvd://2 -ovc lavc -lavcopts vcodec=mpeg4:vpass=2 -oac copy -o movie.avi encode audio track Two pass encode of a DVD to an MPEG-4 ("DivX") AVI while encoding the audio track to MP3. mencoder dvd://2 -ovc lavc -lavcopts vcodec=mpeg4:vpass=1 -oac mp3lame -lameopts vbr=3 -o movie.avi mencoder dvd://2 -ovc lavc -lavcopts vcodec=mpeg4:vpass=2 -oac mp3lame -lameopts vbr=3 -o movie.avi Encoding to MPEG format MEncoder can create MPEG (MPEG-PS) format output files. It is probably useful only with libavcodec's mpeg1video codec, because players - except MPlayer - expect MPEG-1 video, and MPEG-1 layer 2 (MP2) audio streams in MPEG files. This feature is not very useful right now, aside that it probably has many bugs, but the more importantly because MEncoder currently cannot encode MPEG-1 layer 2 (MP2) audio, which all other players expect in MPEG files. To change MEncoder's output file format, use the option. Example: mencoder -of mpeg -ovc lavc -lavcopts vcodec=mpeg1video -oac copy other_options media.avi -o output.mpg Rescaling movies Often the need to resize movie images' size emerges. Its reasons can be many: decreasing file size, network bandwidth,etc. Most people even do rescaling when converting DVDs or SVCDs to DivX AVI. If you wish to rescale, read the Preserving aspect ratio section. The scaling process is handled by the scale video filter: . Its quality can be set with the option. If it is not specified, MEncoder will use 2: bicubic. Usage: mencoder input.mpg -ovc lavc -lavcopts vcodec=mpeg4 -vf scale=640:480 -o output.avi Stream copying MEncoder can handle input streams in two ways: encode or copy them. This section is about copying. Video stream (option ): nice stuff can be done :) Like, putting (not converting!) FLI or VIVO or MPEG-1 video into an AVI file! Of course only MPlayer can play such files :) And it probably has no real life value at all. Rationally: video stream copying can be useful for example when only the audio stream has to be encoded (like, uncompressed PCM to MP3). Audio stream (option ): straightforward. It is possible to take an external audio file (MP3, WAV) and mux it into the output stream. Use the option for this. Encoding with the <systemitem class="library">libavcodec</systemitem> codec family libavcodec provides simple encoding to a lot of interesting video and audio formats. You can encode to the following codecs (more or less up to date): Codec nameDescription mjpeg Motion JPEG ljpeg Lossless JPEG h263 H.263 h263p H.263+ mpeg4 ISO standard MPEG-4 (DivX 5, XVID compatible) msmpeg4 pre-standard MPEG-4 variant by MS, v3 (AKA DivX3) msmpeg4v2 pre-standard MPEG-4 by MS, v2 (used in old asf files) wmv1 Windows Media Video, version 1 (AKA WMV7) wmv2 Windows Media Video, version 2 (AKA WMV8) rv10 an old RealVideo codec mpeg1video MPEG-1 video mpeg2video MPEG-2 video huffyuv lossless compression asv1 ASUS Video v1 asv2 ASUS Video v2 ffv1 FFmpeg's lossless video codec The first column contains the codec names that should be passed after the vcodec config, like: An example, with MJPEG compression: mencoder dvd://2 -o title2.avi -ovc lavc -lavcopts vcodec=mjpeg -oac copy Encoding from multiple input image files (JPEG, PNG, TGA, SGI) MEncoder is capable of creating movies from one or more JPEG, PNG or TGA files. With simple framecopy it can create MJPEG (Motion JPEG), MPNG (Motion PNG) or MTGA (Motion TGA) files. Explanation of the process: MEncoder decodes the input image(s) with libjpeg (when decoding PNGs, it will use libpng). MEncoder then feeds the decoded image to the chosen video compressor (DivX4, XviD, FFmpeg msmpeg4, etc.). Examples The explanation of the option is in the man page. Creating an MPEG-4 file from all the JPEG files in the current directory: mencoder mf://*.jpg -mf w=800:h=600:fps=25:type=jpg -ovc lavc -lavcopts vcodec=mpeg4 -oac copy -o output.avi Creating an MPEG-4 file from some JPEG files in the current directory: mencoder mf://frame001.jpg,frame002.jpg -mf w=800:h=600:fps=25:type=jpg -ovc lavc -lavcopts vcodec=mpeg4 -oac copy -o output.avi Creating a Motion JPEG (MJPEG) file from all the JPEG files in the current directory: mencoder mf://*.jpg -mf w=800:h=600:fps=25:type=jpg -ovc copy -oac copy -o output.avi Creating an uncompressed file from all the PNG files in the current directory: mencoder mf:// -mf w=800:h=600:fps=25:type=png -ovc raw -oac copy -o output.avi Width must be integer multiple of 4, it is a limitation of the RAW RGB AVI format. Creating a Motion PNG (MPNG) file from all the PNG files in the current directory: mencoder mf://*.png -mf w=800:h=600:fps=25:type=png -ovc copy -oac copy -o output.avi Creating a Motion TGA (MTGA) file from all the TGA files in the current directory: mencoder mf://*.tga -mf w=800:h=600:fps=25:type=tga -ovc copy -oac copy -o output.avi Extracting DVD subtitles to VOBsub file MEncoder is capable of extracting subtitles from a DVD into VOBsub formatted files. They consist of a pair of files ending in .idx and .sub and are usually packaged in a single .rar archive. MPlayer can play these with the and options. You specify the basename (i.e without the .idx or .sub extension) of the output files with and the index for this subtitle in the resulting files with . If the input is not from a DVD you should use to indicate the .ifo file needed to construct the resulting .idx file. If the input is not from a DVD and you do not have the .ifo file you will need to use the option to let it know what language id to put in the .idx file. Each run will append the running subtitle if the .idx and .sub files already exist. So you should remove any before starting. Copying two subtitles from a DVD while doing two pass encoding rm subtitles.idx subtitles.sub mencoder dvd://1 -oac copy -ovc lavc -lavcopts vcodec=mpeg4:vpass=1 -vobsubout subtitles -vobsuboutindex 0 -sid 2 mencoder dvd://1 -oac copy -ovc lavc -lavcopts vcodec=mpeg4:vpass=2 -vobsubout subtitles -vobsuboutindex 1 -sid 5 Copying a french subtitle from an MPEG file rm subtitles.idx subtitles.sub mencoder movie.mpg -ifo movie.ifo -vobsubout subtitles -vobsuboutindex 0 -vobsuboutid fr -sid 1 Preserving aspect ratio DVDs and SVCDs (i.e. MPEG-1/2) files contain an aspect ratio value, which describes how the player should scale the video stream, so humans will not have egg heads (ex.: 480x480 + 4:3 = 640x480). However when encoding to AVI (DivX) files, you have be aware that AVI headers do not store this value. Rescaling the movie is disgusting and time consuming, there has to be a better way! There is MPEG-4 has an unique feature: the video stream can contain its needed aspect ratio. Yes, just like MPEG-1/2 (DVD, SVCD) and H.263 files. Regretfully, there are no video players outside which support this attribute of MPEG-4, except MPlayer. This feature can be used only with libavcodec's mpeg4 codec. Keep in mind: although MPlayer will correctly play the created file, other players will use the wrong aspect ratio. You seriously should crop the black bands over and below the movie image. See the man page for the usage of the cropdetect and crop filters. Usage mencoder sample-svcd.mpg -ovc lavc -lavcopts vcodec=mpeg4:autoaspect -vf crop=714:548:0:14 -oac copy -o output.avi Custom inter/intra matrices With this feature of libavcodec you are able to set custom inter (I-frames/keyframes) and intra (P-frames/predicted frames) matrices. It is supported by many of the codecs: mpeg1video and mpeg2video are reported as working. A typical usage of this feature is to set the matrices preferred by the KVCD specifications. The KVCD "Notch" Quantization Matrix: Intra: 8 9 12 22 26 27 29 34 9 10 14 26 27 29 34 37 12 14 18 27 29 34 37 38 22 26 27 31 36 37 38 40 26 27 29 36 39 38 40 48 27 29 34 37 38 40 48 58 29 34 37 38 40 48 58 69 34 37 38 40 48 58 69 79 Inter: 16 18 20 22 24 26 28 30 18 20 22 24 26 28 30 32 20 22 24 26 28 30 32 34 22 24 26 30 32 32 34 36 24 26 28 32 34 34 36 38 26 28 30 32 34 36 38 40 28 30 32 34 36 38 42 42 30 32 34 36 38 40 42 44 Usage: $ mencoder input.avi -o output.avi -oac copy -ovc lavc -lavcopts inter_matrix=...:intra_matrix=... $ mencoder input.avi -ovc lavc -lavcopts vcodec=mpeg2video:intra_matrix=8,9,12,22,26,27,29,34,9,10,14,26,27,29,34,37, 12,14,18,27,29,34,37,38,22,26,27,31,36,37,38,40,26,27,29,36,39,38,40,48,27, 29,34,37,38,40,48,58,29,34,37,38,40,48,58,69,34,37,38,40,48,58,69,79 :inter_matrix=16,18,20,22,24,26,28,30,18,20,22,24,26,28,30,32,20,22,24,26, 28,30,32,34,22,24,26,30,32,32,34,36,24,26,28,32,34,34,36,38,26,28,30,32,34, 36,38,40,28,30,32,34,36,38,42,42,30,32,34,36,38,40,42,44 -oac copy -o svcd.mpg Making a high quality MPEG-4 ("DivX") rip of a DVD movie One frequently asked question is "How do I make the highest quality rip for a given size?". Another question is "How do I make the highest quality DVD rip possible? I do not care about file size, I just want the best quality." The latter question is perhaps at least somewhat wrongly posed. After all, if you do not care about file size, why not simply copy the entire MPEG-2 video stream from the the DVD? Sure, your AVI will end up being 5GB, give or take, but if you want the best quality and do not care about size, this is certainly your best option. In fact, the reason you want to transcode a DVD into MPEG-4 is specifically because you do care about file size. It is difficult to offer a cookbook recipe on how to create a very high quality DVD rip. There are several factors to consider, and you should understand these details or else you are likely to end up disappointed with your results. Below we will investigate some of these issues, and then have a look at an example. We assume you are using libavcodec to encode the video, although the theory applies to other codecs as well. If this seems to be too much for you, you should probably use one of the many fine frontends that are listed in the MEncoder section of our related projects page. That way, you should be able to achieve high quality rips without too much thinking, because most of those tools are designed to take clever decisions for you. Preparing to encode: Identifying source material and framerate Before you even think about encoding a movie, you need to take several preliminary steps. The first and most important step before you encode should be determining what type of content you are dealing with. If your source material comes from DVD or broadcast/cable/satellite TV, it will be stored in one of two formats: NTSC for North America and Japan, PAL for Europe, etc. It is important to realize, however, that this is just the formatting for presentation on a television, and often does not correspond to the original format of the movie. In order to produce a suitable encode, you need to know the original format. Failure to take this into account will result in ugly combing (interlacing) artifacts in your encode. Besides being ugly, the artifacts also harm coding efficiency: You will get worse quality per bitrate. Identifying source framerate Here is a list of common types of source material, where you are likely to find them, and their properties: Standard Film: Produced for theatrical display at 24fps. PAL video: Recorded with a PAL video camera at 50 fields per second. A field consists of just the odd- or even-numbered lines of a frame. Television was designed to refresh these in alternation as a cheap form of analog compression. The human eye supposedly compensates for this, but once you understand interlacing you will learn to see it on TV too and never enjoy TV again. Two fields do not make a complete frame, because they are captured 1/50 of a second apart in time, and thus they do not line up unless there is no motion. NTSC Video: Recorded with an NTSC video camera at 59.94 fields per second, or 60 fields per second in the pre-color era. Otherwise similar to PAL. Animation: Usually drawn at 24fps, but also comes in mixed-framerate varieties. Computer Graphics (CG): Can be any framerate, but some are more common than others; 23 and 30 frames per second are typical for NTSC, and 25fps is typical for PAL. Old Film: Various lower framerates. Identifying source material Movies consisting of frames are referred to as progressive, while those consisting of independent fields are called either interlaced or video - though this latter term is ambiguous. To further complicate matters, some movies will be a mix of several of the above. The most important distinction to make between all of these formats is that some are frame-based, while others are field-based. Whenever a movie is prepared for display on television (including DVD), it is converted to a field-based format. The various methods by which this can be done are collectively referred to as "pulldown", of which the infamous NTSC "3:2 telecine" is one variety. Unless the original material was also field-based (and the same fieldrate), you are getting the movie in a format other than the original. There are several common types of pulldown: PAL 2:2 pulldown: The nicest of them all. Each frame is shown for the duration of two fields, by extracting the even and odd lines and showing them in alternation. If the original material is 24fps, this process speeds up the movie by 4%. PAL 2:2:2:2:2:2:2:2:2:2:2:3 pulldown: Every 12th frame is shown for the duration of three fields, instead of just two. This avoids the 4% speedup issue, but makes the process much more difficult to reverse. It is usually seen in musical productions where adjusting the speed by 4% would seriously damage the musical score. NTSC 3:2 telecine: Frames are shown alternately for the duration of 3 fields or 2 fields. This gives a fieldrate 2.5 times the original framerate. The result is also slowed down very slightly from 60 fields per second to 59.94 fields per second to maintain NTSC fieldrate. NTSC 2:2 pulldown: Used for showing 30fps material on NTSC. Nice, just like 2:2 PAL pulldown. There are also methods for converting between NTSC and PAL video, but such topics are beyond the scope of this guide. If you encounter such a movie and want to encode it, your best bet is to find a copy in the original format. Conversion between these two formats is highly destructive and cannot be reversed cleanly, so your encode will greatly suffer if it is made from a converted source. When video is stored on DVD, consecutive pairs of fields are grouped as a frame, even though they are not intended to be shown at the same moment in time. The MPEG-2 standard used on DVD and digital TV provides both a way to encode the original progressive frames, and store in the header of each frame the number of fields for which it should be shown. If this method has been used, the movie will often be described as "soft-telecined", since the process only directs the DVD player to apply pulldown to the movie rather than altering the movie itself. This case is highly preferable since it can easily be reversed (actually ignored) by the encoder, and since it preserves maximal quality. However, many DVD and broadcast production studios do not use proper encoding techniques but instead produce movies with "hard telecine", where fields are actually duplicated in the encoded MPEG-2. The procedures for dealing with these cases will be covered later in this guide. For now, we leave you with some guides to identifying which type of material you are dealing with: NTSC regions: If MPlayer prints that the framerate has changed to 23.976 when watching your movie, and never changes back, it is almost certainly 24fps content that has been "soft telecined". If MPlayer shows the framerate switching back and forth between 23.976 and 29.97, and you see "combing" at times, then there are several possibilities. The 23.976 fps segments are almost certainly 24fps progressive content, "soft telecined", but the 29.97 fps parts could be either hard-telecined 24fps content or NTSC video content. Use the same guidelines as the following two cases to determine which. If MPlayer never shows the framerate changing, and every single frame with motion appears combed, your movie is NTSC video at 59.94 fields per second. If MPlayer never shows the framerate changing, and two frames out of every five appear combed, your movie is "hard telecined" 24fps content. PAL regions: If you never see any combing, your movie is 2:2 pulldown. If you see combing alternating in and out every half second, then your movie is 2:2:2:2:2:2:2:2:2:2:2:3 pulldown. If you always see combing during motion, then your movie is PAL video at 50 fields per second. Hint: MPlayer can slow down movie playback with the -speed option. Try using 0.2 to watch the movie very slowly and identify the pattern, if you cannot see it at full speed. Constant quantizer vs. multipass It is possible to encode your movie at a wide range of qualities. With modern video encoders and a bit of pre-codec compression (downscaling and denoising), it is possible to achieve very good quality at 700 MB, for a 90-110 minute widescreen movie. Furthermore, all but the longest movies can be encoded with near-perfect quality at 1400 MB. There are three approaches to encoding the video: constant bitrate (CBR), constant quantizer, and multipass (ABR, or average bitrate). Note: Most codecs which support ABR encode only support two pass encode while some others such as x264 and libavcodec support multipass, which slightly improves quality at each pass, yet this improvement is no longer mesurable nor noticeable after the 4th or so pass. Therefore, in this section, two pass and multipass will be used interchangeably. In each of these modes, libavcodec breaks the video frame into 16x16 pixel macroblocks and then applies a quantizer to each macroblock. The lower the quantizer, the better the quality and higher the bitrate. The method libavcodec uses to determine which quantizer to use for a given macroblock varies and is highly tunable. (This is an extreme over-simplification of the actual process, but the basic concept is useful to understand.) When you specify a constant bitrate, libavcodec will encode the video, discarding detail as much as necessary and as little as possible in order to remain lower than the given bitrate. If you truly do not care about file size, you could as well use CBR and specify a bitrate of infinity. (In practice, this means a value high enough so that it poses no limit, like 10000Kbit.) With no real restriction on bitrate, the result is that libavcodec will use the lowest possible quantizer for each macroblock (as specified by , which is 2 by default). As soon as you specify a low enough bitrate that libavcodec is forced to use a higher quantizer, then you are almost certainly ruining the quality of your video. In order to avoid that, you should probably downscale your video, according to the method described later on in this guide. In general, you should avoid CBR altogether if you care about quality. With constant quantizer, libavcodec uses the same quantizer, as specified by the option, on every macroblock. If you want the highest quality rip possible, again ignoring bitrate, you can use . This will yield the same bitrate and PSNR (peak signal-to-noise ratio) as CBR with =infinity and the default of 2. The problem with constant quantizing is that it uses the given quantizer whether the macroblock needs it or not. That is, it might be possible to use a higher quantizer on a macroblock without sacrificing visual quality. Why waste the bits on an unnecessarily low quantizer? Your CPU has as many cycles as there is time, but there is only so many bits on your hard disk. With a two pass encode, the first pass will rip the movie as though it were CBR, but it will keep a log of properties for each frame. This data is then used during the second pass in order to make intelligent decisions about which quantizers to use. During fast action or low detail scenes, higher quantizers will likely be used, and during slow moving or high detail scenes, lower quantizers will be used. If you use , then you are wasting bits. If you use , then you are not getting the highest quality rip. Suppose you rip a DVD at , and the result is 1800Kbit. If you do a two pass encode with , the resulting video will have higher quality for the same bitrate. Since you are now convinced that two pass is the way to go, the real question now is what bitrate to use? The answer is that there is no single answer. Ideally you want to choose a bitrate that yields the best balance between quality and file size. This is going to vary depending on the source video. If size does not matter, a good starting point for a very high quality rip is about 2000Kbit plus or minus 200Kbit. For fast action or high detail source video, or if you just have a very critical eye, you might decide on 2400 or 2600. For some DVDs, you might not notice a difference at 1400Kbit. It is a good idea to experiment with scenes at different bitrates to get a feel. If you aim at a certain size, you will have to somehow calculate the bitrate. But before that, you need to know how much space you should reserve for the audio track(s), so you should rip those first. You can compute the bitrate with the following equation: bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) * 1024 * 1024 / length_in_secs * 8 / 1000 For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB of audio track, the video bitrate will have to be: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps Constraints for efficient encoding Due to the nature of MPEG-type compression, there are various constraints you should follow for maximal quality. MPEG splits the video up into 16x16 squares called macroblocks, each composed of 4 8x8 blocks of luma (intensity) information and two half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and the other for the blue-yellow axis). Even if your movie width and height are not multiples of 16, the encoder will use enough 16x16 macroblocks to cover the whole picture area, and the extra space will go to waste. So in the interests of maximizing quality at a fixed filesize, it is a bad idea to use dimensions that are not multiples of 16. Most DVDs also have some degree of black borders at the edges. Leaving these in place can hurt quality in several ways. MPEG-type compression is also highly dependent on frequency domain transformations, in particular the Discrete Cosine Transform (DCT), which is similar to the Fourier transform. This sort of encoding is efficient for representing patterns and smooth transitions, but it has a hard time with sharp edges. In order to encode them it must use many more bits, or else an artifact known as ringing will appear. The frequency transform (DCT) takes place separately on each macroblock (actually each block), so this problem only applies when the sharp edge is inside a block. If your black borders begin exactly at multiple-of-16 pixel boundaries, this is not a problem. However, the black borders on DVDs rarely come nicely aligned, so in practice you will always need to crop to avoid this penalty. In addition to frequency domain transforms, MPEG-type compression uses motion vectors to represent the change from one frame to the next. Motion vectors naturally work much less efficiently for new content coming in from the edges of the picture, because it is not present in the previous frame. As long as the picture extends all the way to the edge of the encoded region, motion vectors have no problem with content moving out the edges of the picture. However, in the presence of black borders, there can be trouble: For each macroblock, MPEG-type compression stores a vector identifying which part of the previous frame should be copied into this macroblock as a base for predicting the next frame. Only the remaining differences need to be encoded. If a macroblock spans the edge of the picture and contains part of the black border, then motion vectors from other parts of the picture will overwrite the black border. This means that lots of bits must be spent either re-blackening the border that was overwritten, or (more likely) a motion vector will not be used at all and all the changes in this macroblock will have to be coded explicitly. Either way, encoding efficiency is greatly reduced. Again, this problem only applies if black borders do not line up on multiple-of-16 boundaries. Finally, suppose we have a macroblock in the interior of the picture, and an object is moving into this block from near the edge of the image. MPEG-type coding cannot say "copy the part that is inside the picture but not the black border." So the black border will get copied inside too, and lots of bits will have to be spent encoding the part of the picture that is supposed to be there. If the picture runs all the way to the edge of the encoded area, MPEG has special optimizations to repeatedly copy the pixels at the edge of the picture when a motion vector comes from outside the encoded area. This feature becomes useless when the movie has black borders. Unlike problems 1 and 2, aligning the borders at multiples of 16 does not help here. Despite the borders being entirely black and never changing, there is at least a minimal amount of overhead involved in having more macroblocks. For all of these reasons, it is recommended to fully crop black borders. Further, if there is an area of noise/distortion at the edge of the picture, cropping this will improve encoding efficiency as well. Videophile purists who want to preserve the original as close as possible may object to this cropping, but unless you plan to encode at constant quantizer, the quality you gain from cropping will considerably exceed the amount of information lost at the edges. Cropping and Scaling Recall from the previous section that the final picture size you encode should be a multiple of 16 (in both width and height). This can be achieved by cropping, scaling, or a combination of both. When cropping, there are a few guidelines that must be followed to avoid damaging your movie. The normal YUV format, 4:2:0, stores chroma (color) information subsampled, i.e. chroma is only sampled half as often in each direction as luma (intensity) information. Observe this diagram, where L indicates luma sampling points and C chroma. L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L As you can see, rows and columns of the image naturally come in pairs. Thus your crop offsets and dimensions must be even numbers. If they are not, the chroma will no longer line up correctly with the luma. In theory, it is possible to crop with odd offsets, but it requires resampling the chroma which is potentially a lossy operation and not supported by the crop filter. Further, interlaced video is sampled as follows: Top field Bottom field L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L L L L L L L L L C C C C L L L L L L L L As you can see, the pattern does not repeat until after 4 lines. So for interlaced video, your y-offset and height for cropping must be multiples of 4. Native DVD resolution is 720x480 for NTSC, and 720x576 for PAL, but there is an aspect flag that specifies whether it is full-screen (4:3) or wide-screen (16:9). Many (if not most) widescreen DVDs are not strictly 16:9, and will be either 1.85:1 or 2.35:1 (cinescope). This means that there will be black bands in the video that will need to be cropped out. MPlayer provides a crop detection filter that will determine the crop rectangle (). Run MPlayer with and it will print out the crop settings to remove the borders. You should let the movie run long enough that the whole picture area is used, in order to get accurate crop values. Then, test the values you get with MPlayer, using the command line which was printed by , and adjust the rectangle as needed. The filter can help by allowing you to interactively position the crop rectangle over your movie. Remember to follow the above divisibility guidelines so that you do not misalign the chroma planes. In certain cases, scaling may be undesirable. Scaling in the vertical direction is difficult with interlaced video, and if you wish to preserve the interlacing, you should usually refrain from scaling. If you will not be scaling but you still want to use multiple-of-16 dimensions, you will have to overcrop. Do not undercrop, since black borders are very bad for encoding! Because MPEG-4 uses 16x16 macroblocks, you will want to make sure that each dimension of the video you are encoding is a multiple of 16 or else you will be degrading quality, especially at lower bitrates. You can do this by rounding the width and height of the crop rectangle down to the nearest multiple of 16. As stated earlier, when cropping, you will want to increase the Y offset by half the difference of the old and the new height so that the resulting video is taken from the center of the frame. And because of the way DVD video is sampled, make sure the offset is an even number. (In fact, as a rule, never use odd values for any parameter when you are cropping and scaling video.) If you are not comfortable throwing a few extra pixels away, you might prefer instead to scale the video instead. We will look at this in our example below. You can actually let the filter do all of the above for you, as it has an optional parameter that is equal to 16 by default. Also, be careful about "half black" pixels at the edges. Make sure you crop these out too, or else you will be wasting bits there that are better spent elsewhere. After all is said and done, you will probably end up with video whose pixels are not quite 1.85:1 or 2.35:1, but rather something close to that. You could calculate the new aspect ratio manually, but MEncoder offers an option for libavcodec called that will do this for you. Absolutely do not scale this video up in order to square the pixels unless you like to waste your hard disk space. Scaling should be done on playback, and the player will use the aspect stored in the AVI to determine the correct resolution. Unfortunately, not all players enforce this auto-scaling information, therefore you may still want to rescale. First, you should compute the encoded aspect ratio: ARc = (Wc x (ARa / PRdvd )) / Hc where: Wc and Hc are the width and height of the cropped video, ARa is the displayed aspect ratio, which usually is 4/3 or 16/9, PRdvd is the pixel ratio of the DVD which is equal to 1.25=(720/576) for PAL DVDs and 1.5=(720/480) for NTSC DVDs, Then, you can compute the X and Y resolution, according to a certain Compression Quality (CQ) factor: ResY = INT(SQRT( 1000*Bitrate/25/ARc/CQ )/16) * 16 and ResX = INT( ResY * ARc / 16) * 16 Okay, but what is the CQ? The CQ represents the number of bits per pixel and per frame of the encode. Roughly speaking, the greater the CQ, the less the likelihood to see encoding artifacts. However, if you have a target size for your movie (1 or 2 CDs for instance), there is a limited total number of bits that you can spend; therefore it is necessary to find a good tradeoff between compressibility and quality. The CQ depends both on the bitrate and the movie resolution. In order to raise the CQ, typically you would downscale the movie given that the bitrate is computed in function of the target size and the length of the movie, which are constant. A CQ below 0.18 usually ends up in a very blocky picture, because there are not enough bits to code the information of each macroblock (MPEG4, like many other codecs, groups pixels by blocks of several pixels to compress the image; if there are not enough bits, the edges of those blocks are visible). It is therefore wise to take a CQ ranging from 0.20 to 0.22 for a 1 CD rip, and 0.26-0.28 for 2 CDs. Please take note that the CQ is just an indicative figure, as depending on the encoded content, a CQ of 0.18 may look just fine for a Bergman, contrary to a movie such as The Matrix, which contains many high-motion scenes. On the other hand, it is worthless to raise CQ higher than 0.30 as you would be wasting bits without any noticeable quality gain. Audio Audio is a much simpler problem to solve: if you care about quality, just leave it as is. Even AC3 5.1 streams are at most 448Kbit/s, and they are worth every bit. You might be tempted to transcode the audio to high quality Vorbis, but just because you do not have an A/V receiver for AC3 pass-through today does not mean you will not have one tomorrow. Future-proof your DVD rips by preserving the AC3 stream. You can keep the AC3 stream either by copying it directly into the video stream during the encoding. You can also extract the AC3 stream in order to mux it into containers such as NUT or Matroska. mplayer source_file.vob -aid 129 -dumpaudio -dumpfile sound.ac3 will dump into the file sound.ac3 the audio track number 129 from the file source_file.vob (NB: DVD VOB files usually use a different audio numbering, which means that the VOB audio track 129 is the 2nd audio track of the file). But sometimes you truly have no choice but to further compress the sound so that more bits can be spent on the video. Most people choose to compress audio with either MP3 or Vorbis audio codecs. While the latter is a very space-efficient codec, MP3 is better supported by hardware players, although this trend is changing. First of all, you will have to convert the DVD sound into a WAV file that the audio codec can use as input. For example: mplayer source_file.vob -ao pcm:file=destination_sound.wav -vc dummy -aid 1 -vo null will dump the second audio track from the file source_file.vob into the file destination_sound.wav. You may want to normalize the sound before encoding, as DVD audio tracks are commonly recorded at low volumes. You can use the tool normalize for instance, which is available in most distributions. If you are using Windows, a tool such as BeSweet can do the same job. You will compress in either Vorbis or MP3. For example: oggenc -q1 destination_sound.wav will encode destination_sound.wav with the encoding quality 1, which is roughly equivalent to 80Kb/s, and is the minimum quality at which you should encode if you care about quality. Please note that MEncoder currently cannot mux Vorbis audio tracks into the output file because it only supports AVI and MPEG containers as an output, each of which may lead to audio/video playback synchronization problems with some players when the AVI file contain VBR audio streams such as Vorbis. Do not worry, this document will show you how you can do that with third party programs. Interlacing and Telecine Almost all movies are shot at 24 fps. Because NTSC is 30000/1001 fps, some processing must be done to this 24 fps video to make it run at the correct NTSC framerate. The process is called 3:2 pulldown, commonly referred to as telecine (because pulldown is often applied during the telecine process), and, naively described, it works by slowing the film down to 24000/1001 fps, and repeating every fourth frame. No special processing, however, is done to the video for PAL DVDs, which run at 25 fps. (Technically, PAL can be telecined, called 2:2 pulldown, but this does not become an issue in practice.) The 24 fps film is simply played back at 25 fps. The result is that the movie runs slightly faster, but unless you are an alien, you probably will not notice the difference. Most PAL DVDs have pitch-corrected audio, so when they are played back at 25 fps things will sound right, even though the audio track (and hence the whole movie) has a running time that is 4% less than NTSC DVDs. Because the video in a PAL DVD has not been altered, you needn't worry much about frame rate. The source is 25 fps, and your rip will be 25 fps. However, if you are ripping an NTSC DVD movie, you may need to apply inverse telecine. For movies shot at 24 fps, the video on the NTSC DVD is either telecined 30000/1001, or else it is progressive 24000/1001 fps and intended to be telecined on-the-fly by a DVD player. On the other hand, TV series are usually only interlaced, not telecined. This is not a hard rule: some TV series are interlaced (such as Buffy the Vampire Slayer) whereas some are a mixture of progressive and interlaced (such as Angel, or 24). It is highly recommended that you read the section on How to deal with telecine and interlacing in NTSC DVDs to learn how to handle the different possibilities. However, if you are mostly just ripping movies, likely you are either dealing with 24 fps progressive or telecined video, in which case you can use the filter . Encoding interlaced video If the movie you want to encode is interlaced (NTSC video or PAL video), you will need to choose whether you want to deinterlace or not. While deinterlacing will make your movie usable on progressive scan displays such a computer monitors and projectors, it comes at a cost: The fieldrate of 50 or 59.94 fields per second is halved to 25 or 29.97 frames per second, and roughly half of the information in your movie will be lost during scenes with significant motion. Therefore, if you are encoding for high quality archival purposes, it is recommended not to deinterlace. You can always deinterlace the movie at playback time when displaying it on progressive scan devices, and future players will be able to deinterlace to full fieldrate, interpolating 50 or 59.94 entire frames per second from the interlaced video. Special care must be taken when working with interlaced video: Crop height and y-offset must be multiples of 4. Any vertical scaling must be performed in interlaced mode. Postprocessing and denoising filters may not work as expected unless you take special care to operate them a field at a time, and they may damage the video if used incorrectly. With these things in mind, here is our first example: mencoder capture.avi -mc 0 -oac lavc -ovc lavc -lavcopts \ vcodec=mpeg2video:vbitrate=6000:ilmv:ildct:acodec=mp2:abitrate=224 Note the and options. Filtering In general, you want to do as little filtering as possible to the movie in order to remain close to the original DVD source. Cropping is often necessary (as described above), but do not scale the video. Although scaling down is sometimes preferred to using higher quantizers, we want to avoid both these things: remember that we decided from the start to trade bits for quality. Also, do not adjust gamma, contrast, brightness, etc. What looks good on your display may not look good on others. These adjustments should be done on playback only. One thing you might want to do, however, is pass the video through a very light denoise filter, such as . Again, it is a matter of putting those bits to better use: why waste them encoding noise when you can just add that noise back in during playback? Increasing the parameters for will further improve compressibility, but if you increase the values too much, you risk degrading the image visibily. The suggested values above () are quite conservative; you should feel free to experiment with higher values and observe the results for yourself. Encoding options of libavcodec Ideally, you would probably want to be able to just tell the encoder to switch into "high quality" mode and move on. That would probably be nice, but unfortunately hard to implement as different encoding options yield different quality results depending on the source material. That is because compression depends on the visual properties of the video in question. For example, anime and live action have very different properties and thus require different options to obtain optimum encoding. The good news is that some options should never be left out, like , , and . See below for a detailed description of common encoding options. Options to adjust: vmax_b_frames: 1 or 2 is good, depending on the movie. Note that libavcodec does not yet support closed GOP (the option does not currently work), so DivX5 will not be able to decode anything encoded with B-frames. vb_strategy=1: helps in high-motion scenes. Requires vmax_b_frames >= 2. On some videos, vmax_b_frames may hurt quality, but vmax_b_frames=2 along with vb_strategy=1 helps. dia: motion search range. Bigger is better and slower. Negative values are a completely different scale. Good values are -1 for a fast encode, or 2-4 for slower. predia: motion search pre-pass. Not as important as dia. Good values are 1 (default) to 4. Requires preme=2 to really be useful. cmp, subcmp, precmp: Comparison function for motion estimation. Experiment with values of 0 (default), 2 (hadamard), 3 (dct), and 6 (rate distortion). 0 is fastest, and sufficient for precmp. For cmp and subcmp, 2 is good for anime, and 3 is good for live action. 6 may or may not be slightly better, but is slow. last_pred: Number of motion predictors to take from the previous frame. 1-3 or so help at little speed cost. Higher values are slow for no extra gain. cbp, mv0: Controls the selection of macroblocks. Small speed cost for small quality gain. qprd: adaptive quantization based on the macroblock's complexity. May help or hurt depending on the video and other options. This can cause artifacts unless you set vqmax to some reasonably small value (6 is good, maybe as low as 4); vqmin=1 should also help. qns: very slow, especially when combined with qprd. This option will make the encoder minimize noise due to compression artifacts instead of making the encoded video strictly match the source. Do not use this unless you have already tweaked everything else as far as it will go and the results still are not good enough. vqcomp: Tweak ratecontrol. What values are good depends on the movie. You can safely leave this alone if you want. Reducing vqcomp puts more bits on low-complexity scenes, increasing it puts them on high-complexity scenes (default: 0.5, range: 0-1. recommended range: 0.5-0.7). vlelim, vcelim: Sets the single coefficient elimination threshold for luminance and chroma planes. These are encoded separately in all MPEG-like algorithms. The idea behind these options is to use some good heuristics to determine when the change in a block is less than the threshold you specify, and in such a case, to just encode the block as "no change". This saves bits and perhaps speeds up encoding. vlelim=-4 and vcelim=9 seem to be good for live movies, but seem not to help with anime; when encoding animation, you should probably leave them unchanged. qpel: Quarter pixel motion estimation. MPEG-4 uses half pixel precision for its motion search by default, therefore this option comes with an overhead as more information will be stored in the encoded file. The compression gain/loss depends on the movie, but it is usually not very effective on anime. qpel always incurs a significant cost in CPU decode time (+20% in practice). psnr: does not affect the actual encoding, but writes a log file giving the type/size/quality of each frame, and prints a summary of PSNR (Peak Signal to Noise Ratio) at the end. Options not recommended to play with: vme: The default is best. lumi_mask, dark_mask: Psychovisual adaptive quantization. You do not want to play with those options if you care about quality. Reasonable values may be effective in your case, but be warned this is very subjective. scplx_mask: Tries to prevent blocky artifacts, but postprocessing is better. Example So, you have just bought your shiny new copy of Harry Potter and the Chamber of Secrets (widescreen edition, of course), and you want to rip this DVD so that you can add it to your Home Theatre PC. This is a region 1 DVD, so it is NTSC. The example below will still apply to PAL, except you will omit (because the output framerate is the same as the input framerate), and of course the crop dimensions will be different. After running , we follow the process detailed in the section How to deal with telecine and interlacing in NTSC DVDs and discover that it is 24000/1001 fps progressive video, which means that we needn't use an inverse telecine filter, such as or . Next, we want to determine the appropriate crop rectangle, so we use the cropdetect filter: mplayer dvd://1 -vf cropdetect Make sure you seek to a fully filled frame (such as a bright scene), and you will see in MPlayer's console output: crop area: X: 0..719 Y: 57..419 (-vf crop=720:362:0:58) We then play the movie back with this filter to test its correctness: mplayer dvd://1 -vf crop=720:362:0:58 And we see that it looks perfectly fine. Next, we ensure the width and height are a multiple of 16. The width is fine, however the height is not. Since we did not fail 7th grade math, we know that the nearest multiple of 16 lower than 362 is 352. We could just use , but it would be nice to take a little off the top and a little off the bottom so that we retain the center. We have shrunk the height by 10 pixels, but we do not want to increase the y-offset by 5-pixels since that is an odd number and will adversely affect quality. Instead, we will increase the y-offset by 4 pixels: mplayer dvd://1 -vf crop=720:352:0:62 Another reason to shave pixels from both the top and the bottom is that we ensure we have eliminated any half-black pixels if they exist. Note that if your video is telecined, make sure the filter (or whichever inverse telecine filter you decide to use) appears in the filter chain before you crop. If it is interlaced, deinterlace before cropping. (If you choose to preserve the interlaced video, then make sure your vertical crop offset is a multiple of 4.) If you are really concerned about losing those 10 pixels, you might prefer instead to scale the dimensions down to the nearest multiple of 16. The filter chain would look like: -vf crop=720:362:0:58,scale=720:352 Scaling the video down like this will mean that some small amount of detail is lost, though it probably will not be perceptible. Scaling up will result in lower quality (unless you increase the bitrate). Cropping discards those pixels altogether. It is a tradeoff that you will want to consider for each circumstance. For example, if the DVD video was made for television, you might want to avoid vertical scaling, since the line sampling corresponds to the way the content was originally recorded. On inspection, we see that our movie has a fair bit of action and high amounts of detail, so we pick 2400Kbit for our bitrate. We are now ready to do the two pass encode. Pass one: mencoder dvd://1 -ofps 24000/1001 -oac copy -vf crop=720:352:0:62,hqdn3d=2:1:2 -ovc lavc \ -lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:mbcmp=3:autoaspect:vpass=1 \ -o Harry_Potter_2.avi And pass two is the same, except that we specify : mencoder dvd://1 -ofps 24000/1001 -oac copy -vf crop=720:352:0:62,hqdn3d=2:1:2 -ovc lavc \ -lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:mbcmp=3:autoaspect:vpass=2 \ -o Harry_Potter_2.avi The options will greatly increase the quality at the expense of encoding time. There is little reason to leave these options out when the primary goal is quality. The options select a comparison function that yields higher quality than the defaults. You might try experimenting with this parameter (refer to the man page for the possible values) as different functions can have a large impact on quality depending on the source material. For example, if you find libavcodec produces too much blocky artifacting, you could try selecting the experimental NSSE as comparison function via . For this movie, the resulting AVI will be 138 minutes long and nearly 3GB. And because you said that file size does not matter, this is a perfectly acceptable size. However, if you had wanted it smaller, you could try a lower bitrate. Increasing bitrates have diminishing returns, so while we might clearly see an improvement from 1800Kbit to 2000Kbit, it might not be so noticeable above 2000Kbit. Feel free to experiment until you are happy. Because we passed the source video through a denoise filter, you may want to add some of it back during playback. This, along with the post-processing filter, drastically improves the perception of quality and helps eliminate blocky artifacts in the video. With MPlayer's option, you can vary the amount of post-processing done by the spp filter depending on available CPU. Also, at this point, you may want to apply gamma and/or color correction to best suit your display. For example: mplayer Harry_Potter_2.avi -vf spp,noise=9ah:5ah,eq2=1.2 -autoq 3 Muxing Now that you have encoded your video, you will most likely want to mux it with one or more audio tracks into a movie container, such as AVI, MPEG, Matroska or NUT. MEncoder is currently only able to output audio and video into MPEG and AVI container formats. for example: mencoder -oac copy -ovc copy -o output_movie.avi -audiofile input_audio.mp2 input_video.avi This would merge the video file input_video.avi and the audio file input_audio.mp2 into the AVI file output_movie.avi. This command works with MPEG-1 layer I, II and III (more commonly known as MP3) audio, WAV and a few other audio formats too. MEncoder features experimental support for libavformat, which is a library from the FFmpeg project that supports muxing and demuxing a variety of containers. For example: mencoder -oac copy -ovc copy -o output_movie.asf -audiofile input_audio.mp2 input_video.avi -of lavf -lavfopts format=asf This will do the same thing as the previous example, except that the output container will be ASF. Please note that this support is highly experimental (but getting better every day), and will only work if you compiled MPlayer with the support for libavformat enabled (which means that a pre-packaged binary version will not work in most cases). Limitations of the AVI container Although it is the most widely-supported container format after MPEG-1, AVI also has some major drawbacks. Perhaps the most obvious is the overhead. For each chunk of the AVI file, 24 bytes are wasted on headers and index. This translates into a little over 5 MB per hour, or 1-2.5% overhead for a 700 MB movie. This may not seem like much, but it could mean the difference between being able to use 700 kbit/sec video or 714 kbit/sec, and every bit of quality counts. In addition this gross inefficiency, AVI also has the following major limitations: Only fixed-fps content can be stored. This is particularly limiting if the original material you want to encode is mixed content, for example a mix of NTSC video and film material. Actually there are hacks that can be used to store mixed-framerate content in AVI, but they increase the (already huge) overhead fivefold or more and so are not practical. Audio in AVI files must be either constant-bitrate (CBR) or constant-framesize (i.e. all frames decode to the same number of samples). Unfortunately, the most efficient codec, Vorbis, does not meet either of these requirements. Therefore, if you plan to store your movie in AVI, you will have to use a less efficient codec such as MP3 or AC3. Having said all that, MEncoder does not currently support variable-fps output or Vorbis encoding. Therefore, you may not see these as limitations if MEncoder is the only tool you will be using to produce your encodes. However, it is possible to use MEncoder only for video encoding, and then use external tools to encode audio and mux it into another container format. Muxing into the Matroska container Matroska is a free, open standard container format, aiming to offer a lot of advanced features, which older containers like AVI cannot handle. For example, Matroska supports variable bitrate audio content (VBR), variable framerates (VFR), chapters, file attachments, error detection code (EDC) and modern A/V Codecs like "Advanced Audio Coding" (AAC), "Vorbis" or "MPEG-4 AVC" (H.264), next to nothing handled by AVI. The tools required to create Matroska files are collectively called mkvtoolnix, and are available for most Unix platforms as well as Windows. Because Matroska is an open standard you may find other tools that suit you better, but since mkvtoolnix is the most common, and is supported by the Matroska team itself, we will only cover its usage. Probably the easiest way to get started with Matroska is to use MMG, the graphical frontend shipped with mkvtoolnix, and follow the guide to mkvmerge GUI (mmg) You may also mux audio and video files using the command line: mkvmerge -o output.mkv input_video.avi input_audio1.mp3 input_audio2.ac3 This would merge the video file input_video.avi and the two audio files input_audio1.mp3 and input_audio2.ac3 into the Matroska file output.mkv. Matroska, as mentioned earlier, is able to do much more than that, like multiple audio tracks (including fine-tuning of audio/video synchronization), chapters, subtitles, splitting, etc... Please refer to the documentation of those applications for more details. Encoding with the <systemitem class="library">x264</systemitem> codec x264 is a free library for encoding H.264/AVC video streams. Before starting to encode, you need to set up MEncoder to support it. What options should I use to get the best results? Please begin by reviewing the x264 section of MPlayer's man page. This section is intended to be a supplement to the man page. There are mainly three types of considerations when choosing encoding options: Trading off encoding time vs. quality Frame type decision options Ratecontrol and quantization decision options This guide is mostly concerned with the first class of options. The other two types often have more to do with personal preferences and individual requirements. Before continuing, please note that this guide uses only one quality metric: global PSNR. For a brief explanation of what PSNR is, see the Wikipedia article on PSNR. Global PSNR is the last PSNR number reported when you include the option in . Any time you read a claim about PSNR, one of the assumptions behind the claim is that equal bitrates are used. Nearly all of this guide's comments assume you are using two pass. When comparing options, there are two major reasons for using two pass encoding. First, using two pass often gains around 1dB PSNR, which is a very big difference. Secondly, testing options by doing direct quality comparisons with one pass encodes is a dubious proposition because bitrate often varies significantly with each encode. It is not always easy to tell whether quality changes are due mainly to changed options, or if they mostly reflect differences in the achieved bitrate. Of the options which allow you to trade off speed for quality, and are usually by far the most important. If you are interested in tweaking either speed or quality, these are the first options you should consider. On the speed dimension, the and options interact with each other fairly strongly. Experience shows that, with one reference frame, takes about 35% more time than . With 6 reference frames, the penalty grows to over 60%. 's effect on PSNR seems fairly constant regardless of the number of reference frames. Typically, gains 0.2-0.5 dB global PSNR over . This is usually enough to be visible. Encoding options of x264 frameref: is set to 1 by default, but this should not be taken to imply that it is reasonable to set it to 1. Merely raising to 2 gains around 0.15dB PSNR with a 5-10% speed penalty; this seems like a good tradeoff. gains around 0.25dB PSNR over , which should be a visible difference. is around 15% slower than . Unfortunately, diminishing returns set in rapidly. can be expected to gain only 0.05-0.1 dB over at an additional 15% speed penalty. Above , the quality gains are usually very small (although you should keep in mind throughout this whole discussion that it can vary quite a lot depending on your source). In a fairly typical case, will improve global PSNR by a tiny 0.02dB over , at a speed cost of 15%-20%. At such high values, the only really good thing that can be said is that increasing even further will almost certainly never harm PSNR, but the additional quality benefits are barely even measurable, let alone perceptible. Note: Raising to unnecessarily high values can and usually does hurt coding efficiency if you turn CABAC off. With CABAC on (the default behavior), the possibility of setting "too high" currently seems too remote to even worry about, and in the future, optimizations may remove the possibility altogether. If you care about speed, a reasonable compromise is to use low and values on the first pass, and then raise them on the second pass. Typically, this has a negligible negative effect on the final quality: You will probably lose well under 0.1dB PSNR, which should be much too small of a difference to see. However, different values of can occasionally affect frametype decision. Most likely, these are rare outlying cases, but if you want to be pretty sure, consider whether your video has either fullscreen repetitive flashing patterns or very large temporary occlusions which might force an I-frame. Adjust the first-pass so it is large enough to contain the duration of the flashing cycle (or occlusion). For example, if the scene flashes back and forth between two images over a duration of three frames, set the first pass to 3 or higher. This issue is probably extremely rare in live action video material, but it does sometimes come up in video game captures. bframes: The usefulness of B-frames is questionable in most other codecs you may be used to. In H.264, this has changed: there are new techniques and block types that are possible in B-frames. Usually, even a naive B-frame choice algorithm can have a significant PSNR benefit. It is also interesting to note that if you turn off the adaptive B-frame decision (), encoding with usually speeds up encoding speed somewhat. With adaptive B-frame decision turned off ('s ), the optimal value for this setting will usually range from to . With adaptive B-frame decision on (the default behavior), it is probably safe to use higher values; the encoder will try to reduce the use of B-frames in scenes where they would hurt compression. If you are going to use at all, consider setting the maximum number of B-frames to 2 or higher in order to take advantage of weighted prediction. b_adapt: Note: This is on by default. With this option enabled, the encoder will use some simple heuristics to reduce the number of B-frames used in scenes that might not benefit from them as much. You can use to tweak how B-frame-happy the encoder is. The speed penalty of adaptive B-frames is currently rather modest, but so is the potential quality gain. It usually does not hurt, however. Note that this only affects speed and frametype decision on the first pass. and have no effect on subsequent passes. b_pyramid: You might as well enable this option if you are using >=2 B-frames; as the man page says, you get a little quality improvement at no speed cost. Note that these videos cannot be read by libavcodec-based decoders older than about March 5, 2005. weight_b: In typical cases, there is not much gain with this option. However, in crossfades or fade-to-black scenes, weighted prediction gives rather large bitrate savings. In MPEG-4 ASP, a fade-to-black is usually best coded as a series of expensive I-frames; using weighted prediction in B-frames makes it possible to turn at least some of these into much more reasonably-sized B-frames. Encoding time cost seems to be minimal, if there is any. Also, contrary to what some people seem to guess, the decoder CPU requirements are not much affected by weighted prediction, all else being equal. Unfortunately, the current adaptive B-frame decision algorithm has a strong tendency to avoid B-frames during fades. Until this changes, it may be a good idea to add to your x264encopts, if you expect fades to have a significant effect in your particular video clip. deblockalpha, deblockbeta: This topic is going to be a bit controversial. H.264 defines a simple deblocking procedure on I-blocks that uses pre-set strengths and thresholds depending on the QP of the block in question. By default, high QP blocks are filtered heavily, and low QP blocks are not deblocked at all. The pre-set strengths defined by the standard are well-chosen and the odds are very good that they are PSNR-optimal for whatever video you are trying to encode. The and parameters allow you to specify offsets to the preset deblocking thresholds. Many people seem to think it is a good idea to lower the deblocking filter strength by large amounts (say, -3). This is however almost never a good idea, and in most cases, people who are doing this do not understand very well how deblocking works by default. The first and most important thing to know about the in-loop deblocking filter is that the default thresholds are almost always PSNR-optimal. In the rare cases that they are not optimal, the ideal offset is plus or minus 1. Adjusting deblocking parameters by a larger amount is almost guaranteed to hurt PSNR. Strengthening the filter will smear more details; weakening the filter will increase the appearance of blockiness. It is definitely a bad idea to lower the deblocking thresholds if your source is mainly low in spacial complexity (i.e., not a lot of detail or noise). The in-loop filter does a rather excellent job of concealing the artifacts that occur. If the source is high in spacial complexity, however, artifacts are less noticeable. This is because the ringing tends to look like detail or noise. Human visual perception easily notices when detail is removed, but it does not so easily notice when the noise is wrongly represented. When it comes to subjective quality, noise and detail are somewhat interchangeable. By lowering the deblocking filter strength, you are most likely increasing error by adding ringing artifacts, but the eye does not notice because it confuses the artifacts with detail. This still does not justify lowering the deblocking filter strength, however. You can generally get better quality noise from postprocessing. If your H.264 encodes look too blurry or smeared, try playing with when you play your encoded movie. should conceal most mild artifacting. It will almost certainly look better than the results you would have gotten just by fiddling with the deblocking filter. How to deal with telecine and interlacing within NTSC DVDs Introduction What is telecine? I suggest you visit this page if you do not understand much of what is written in this document: http://www.divx.com/support/guides/guide.php?gid=10 This URL links to an understandable and reasonably comprehensive description of what telecine is. A note about the numbers. Many documents, including the guide linked above, refer to the fields per second value of NTSC video as 59.94 and the corresponding frames per second values as 29.97 (for telecined and interlaced) and 23.976 (for progressive). For simplicity, some documents even round these numbers to 60, 30, and 24. Strictly speaking, all those numbers are approximations. Black and white NTSC video was exactly 60 fields per second, but 60000/1001 was later chosen to accomodate color data while remaining compatible with contemporary black and white televisions. Digital NTSC video (such as on a DVD) is also 60000/1001 fields per second. From this, interlaced and telecined video are derived to be 30000/1001 frames per second; progressive video is 24000/1001 frames per second. Older versions of the MEncoder documentation and many archived mailing list posts refer to 59.94, 29.97, and 23.976. All MEncoder documentation has been updated to use the fractional values, and you should use them too. is incorrect. should be used instead. How telecine is used. All video intended to be displayed on an NTSC television set must be 60000/1001 fields per second. Made-for-TV movies and shows are often filmed directly at 60000/1001 fields per second, but the majority of cinema is filmed at 24 or 24000/1001 frames per second. When cinematic movie DVDs are mastered, the video is then converted for television using a process called telecine. On a DVD, the video is never actually stored as 60000/1001 fields per second. For video that was originally 60000/1001, each pair of fields is combined to form a frame, resulting in 30000/1001 frames per second. Hardware DVD players then read a flag embedded in the video stream to determine whether the odd- or even-numbered lines should form the first field. Usually, 24000/1001 frames per second content stays as it is when encoded for a DVD, and the DVD player must perform telecining on-the-fly. Sometimes, however, the video is telecined before being stored on the DVD; even though it was originally 24000/1001 frames per second, it becomes 60000/1001 fields per second. When it is stored on the DVD, pairs of fields are combined to form 30000/1001 frames per second. When looking at individual frames formed from 60000/10001 fields per second video, telecined or otherwise, interlacing is clearly visible wherever there is any motion, because one field (say, the even-numbered lines) represents a moment in time 1/(60000/1001) seconds later than the other. Playing interlaced video on a computer looks ugly both because the monitor is higher resolution and because the video is shown frame-after-frame instead of field-after-field. Notes: This section only applies to NTSC DVDs, and not PAL. The example MEncoder lines throughout the document are not intended for actual use. They are simply the bare minimum required to encode the pertaining video category. How to make good DVD rips or fine-tune libavcodec for maximal quality is not within the scope of this document. There are a couple footnotes specific to this guide, linked like this: [1] How to tell what type of video you have Progressive Progressive video was originally filmed at 24000/1001 fps, and stored on the DVD without alteration. When you play a progressive DVD in MPlayer, MPlayer will print the following line as soon as the movie begins to play: demux_mpg: 24000/1001 fps progressive NTSC content detected, switching framerate. From this point forward, demux_mpg should never say it finds "30000/1001 fps NTSC content." When you watch progressive video, you should never see any interlacing. Beware, however, because sometimes there is a tiny bit of telecine mixed in where you would not expect. I have encountered TV show DVDs that have one second of telecine at every scene change, or at seemingly random places. I once watched a DVD that had a progressive first half, and the second half was telecined. If you want to be really thorough, you can scan the entire movie: mplayer dvd://1 -nosound -vo null -benchmark Using makes MPlayer play the movie as quickly as it possibly can; still, depending on your hardware, it can take a while. Every time demux_mpg reports a framerate change, the line immediately above will show you the time at which the change occurred. Sometimes progressive video on DVDs is referred to as "soft-telecine" because it is intended to be telecined by the DVD player. Telecined Telecined video was originally filmed at 24000/1001, but was telecined before it was written to the DVD. MPlayer does not (ever) report any framerate changes when it plays telecined video. Watching a telecined video, you will see interlacing artifacts that seem to "blink": they repeatedly appear and disappear. You can look closely at this by mplayer dvd://1 Seek to a part with motion. Use the . key to step forward one frame at a time. Look at the pattern of interlaced-looking and progressive-looking frames. If the pattern you see is PPPII,PPPII,PPPII,... then the video is telecined. If you see some other pattern, then the video may have been telecined using some non-standard method; MEncoder cannot losslessly convert non-standard telecine to progressive. If you do not see any pattern at all, then it is most likely interlaced. Sometimes telecined video on DVDs is referred to as "hard-telecine". Since hard-telecine is already 60000/1001 fields per second, the DVD player plays the video without any manipulation. Interlaced Interlaced video was originally filmed at 60000/1001 fields per second, and stored on the DVD as 30000/1001 frames per second. The interlacing effect (often called "combing") is a result of combining pairs of fields into frames. Each field is supposed to be 1/(60000/1001) seconds apart, and when they are displayed simultaneously the difference is apparent. As with telecined video, MPlayer should not ever report any framerate changes when playing interlaced content. When you view an interlaced video closely by frame-stepping with the . key, you will see that every single frame is interlaced. Mixed progressive and telecine All of a "mixed progressive and telecine" video was originally 24000/1001 frames per second, but some parts of it ended up being telecined. When MPlayer plays this category, it will (often repeatedly) switch back and forth between "30000/1001 fps NTSC" and "24000/1001 fps progressive NTSC". Watch the bottom of MPlayer's output to see these messages. You should check the "30000/1001 fps NTSC" sections to make sure they are actually telecine, and not just interlaced. Mixed progressive and interlaced In "mixed progressive and interlaced" content, progressive and interlaced video have been spliced together. This category looks just like "mixed progressive and telecine", until you examine the 30000/1001 fps sections and see that they do not have the telecine pattern. How to encode each category As I mentioned in the beginning, example MEncoder lines below are not meant to actually be used; they only demonstrate the minimum parameters to properly encode each category. Progressive Progressive video requires no special filtering to encode. The only parameter you need to be sure to use is . Otherwise, MEncoder will try to encode at 30000/1001 fps and will duplicate frames. mencoder dvd://1 -nosound -ovc lavc -ofps 24000/1001 It is often the case, however, that a video that looks progressive actually has very short parts of telecine mixed in. Unless you are sure, it is safest to treat the video as mixed progressive and telecine. The performance loss is small [3]. Telecined Telecine can be reversed to retrieve the original 24000/1001 content, using a process called inverse-telecine. MPlayer contains several filters to accomplish this; the best filter, , is described in the mixed progressive and telecine section. Interlaced For most practical cases it is not possible to retrieve a complete progressive video from interlaced content. The only way to do so without losing half of the vertical resolution is to double the framerate and try to "guess" what ought to make up the corresponding lines for each field (this has drawbacks - see method 3). Encode the video in interlaced form. Normally, interlacing wreaks havoc with the encoder's ability to compress well, but libavcodec has two parameters specifically for dealing with storing interlaced video a bit better: and . Also, using is strongly recommended [2] because it will encode macroblocks as non-interlaced in places where there is no motion. Note that is NOT needed here. mencoder dvd://1 -nosound -ovc lavc -lavcopts ildct:ilme:mbd=2 Use a deinterlacing filter before encoding. There are several of these filters available to choose from, each with its own advantages and disadvantages. Consult to see what is available (grep for "deint"), and search the MPlayer mailing lists to find many discussions about the various filters. Again, the framerate is not changing, so no . Also, deinterlacing should be done after cropping [1] and before scaling. mencoder dvd://1 -nosound -vf pp=lb -ovc lavc Unfortunately, this option is buggy with MEncoder; it ought to work well with MEncoder G2, but that is not here yet. You might experience crahes. Anyway, the purpose of is to create a full frame out of each field, which makes the framerate 60000/1001. The advantage of this approach is that no data is ever lost; however, since each frame comes from only one field, the missing lines have to be interpolated somehow. There are no very good methods of generating the missing data, and so the result will look a bit similar to when using some deinterlacing filters. Generating the missing lines creates other issues, as well, simply because the amount of data doubles. So, higher encoding bitrates are required to maintain quality, and more CPU power is used for both encoding and decoding. tfields has several different options for how to create the missing lines of each frame. If you use this method, then Reference the manual, and chose whichever option looks best for your material. Note that when using you have to specify both and to be twice the framerate of your original source. mencoder dvd://1 -nosound -vf tfields=2 -ovc lavc -fps 60000/1001 -ofps 60000/1001 If you plan on downscaling dramatically, you can extract and encode only one of the two fields. Of course, you will lose half the vertical resolution, but if you plan on downscaling to at most 1/2 of the original, the loss will not matter much. The result will be a progressive 30000/1001 frames per second file. The procedure is to use , then crop [1] and scale appropriately. Remember that you will have to adjust the scale to compensate for the vertical resolution being halved. mencoder dvd://1 -nosound -vf field=0 -ovc lavc Mixed progressive and telecine In order to turn mixed progressive and telecine video into entirely progressive video, the telecined parts have to be inverse-telecined. There are three ways to accomplish this, described below. Note that you should always inverse-telecine before any rescaling; unless you really know what you are doing, inverse-telecine before cropping, too [1]. is needed here because the output video will be 24000/1001 frames per second. is designed to inverse-telecine telecined material while leaving progressive data alone. In order to work properly, must be followed by the filter or else MEncoder will crash. is, however, the cleanest and most accurate method available for encoding both telecine and "mixed progressive and telecine". mencoder dvd://1 -nosound -vf pullup,softskip -ovc lavc -ofps 24000/1001 An older method is to, rather than inverse-telecine the telecined parts, telecine the non-telecined parts and then inverse-telecine the whole video. Sound confusing? softpulldown is a filter that goes through a video and makes the entire file telecined. If we follow softpulldown with either or , the final result will be entirely progressive. is needed. mencoder dvd://1 -nosound -vf softpulldown,ivtc=1 -ovc lavc -ofps 24000/1001 I have not used myself, but here is what D Richard Felker III has to say:
It is OK, but IMO it tries to deinterlace rather than doing inverse telecine too often (much like settop DVD players & progressive TVs) which gives ugly flickering and other artifacts. If you are going to use it, you at least need to spend some time tuning the options and watching the output first to make sure it is not messing up.
Mixed progressive and interlaced There are two options for dealing with this category, each of which is a compromise. You should decide based on the duration/location of each type. Treat it as progressive. The interlaced parts will look interlaced, and some of the interlaced fields will have to be dropped, resulting in a bit of uneven jumpiness. You can use a postprocessing filter if you want to, but it may slightly degrade the progressive parts. This option should definitely not be used if you want to eventually display the video on an interlaced device (with a TV card, for example). If you have interlaced frames in a 24000/1001 frames per second video, they will be telecined along with the progressive frames. Half of the interlaced "frames" will be displayed for three fields' duration (3/(60000/1001) seconds), resulting in a flicking "jump back in time" effect that looks quite bad. If you even attempt this, you must use a deinterlacing filter like or . It may also be a bad idea for progressive display, too. It will drop pairs of consecutive interlaced fields, resulting in a discontinuity that can be more visible than with the second method, which shows some progressive frames twice. 30000/1001 frames per second interlaced video is already a bit choppy because it really should be shown at 60000/1001 fields per second, so the duplicate frames do not stand out as much. Either way, it is best to consider your content and how you intend to display it. If your video is 90% progressive and you never intend to show it on a TV, you should favor a progressive approach. If it is only half progressive, you probably want to encode it as if it is all interlaced. Treat it as interlaced. Some frames of the progressive parts will need to be duplicated, resulting in uneven jumpiness. Again, deinterlacing filters may slightly degrade the progressive parts.
Footnotes About cropping: Video data on DVDs are stored in a format called YUV 4:2:0. In YUV video, luma ("brightness") and chroma ("color") are stored separately. Because the human eye is somewhat less sensitive to color than it is to brightness, in a YUV 4:2:0 picture there is only one chroma pixel for every four luma pixels. In a progressive picture, each square of four luma pixels (two on each side) has one common chroma pixel. You must crop progressive YUV 4:2:0 to even resolutions, and use even offsets. For example, is OK but is not. When you are dealing with interlaced YUV 4:2:0, the situation is a bit more complicated. Instead of every four luma pixels in the frame sharing a chroma pixel, every four luma pixels in each field share a chroma pixel. When fields are interlaced to form a frame, each scanline is one pixel high. Now, instead of all four luma pixels being in a square, there are two pixels side-by-side, and the other two pixels are side-by-side two scanlines down. The two luma pixels in the intermediate scanline are from the other field, and so share a different chroma pixel with two luma pixels two scanlines away. All this confusion makes it necessary to have vertical crop dimensions and offsets be multiples of four. Horizontal can stay even. For telecined video, I recommend that cropping take place after inverse telecining. Once the video is progressive you only need to crop by even numbers. If you really want to gain the slight speedup that cropping first may offer, you must crop vertically by multiples of four or else the inverse-telecine filter will not have proper data. For interlaced (not telecined) video, you must always crop vertically by multiples of four unless you use before cropping. About encoding parameters and quality: Just because I recommend here does not mean it should not be used elsewhere. Along with , is one of the two libavcodec options that increases quality the most, and you should always use at least those two unless the drop in encoding speed is prohibitive (e.g. realtime encoding). There are many other options to libavcodec that increase encoding quality (and decrease encoding speed) but that is beyond the scope of this document. About the performance of pullup: It is safe to use (along with ) on progressive video, and is usually a good idea unless the source has been definitively verified to be entirely progressive. The performace loss is small for most cases. On a bare-minimum encode, causes MEncoder to be 50% slower. Adding sound processing and advanced overshadows that difference, bringing the performance decrease of using down to 2%.