mirror of https://github.com/mpv-player/mpv
1076 lines
37 KiB
Plaintext
1076 lines
37 KiB
Plaintext
==================================
|
|
NUT Open Container Format 20061104
|
|
==================================
|
|
|
|
|
|
|
|
Intro:
|
|
======
|
|
|
|
NUT is a free multimedia container format for storage of audio, video,
|
|
subtitles and related user defined streams, it provides exact timestamps for
|
|
synchronization and seeking, is simple, has low overhead and can recover
|
|
in case of errors in the stream.
|
|
|
|
Other common multimedia container formats are AVI, Ogg, Matroska, MP4, MOV
|
|
ASF, MPEG-PS, MPEG-TS.
|
|
|
|
|
|
Features / goals:
|
|
(supported by the format, not necessarily by a specific implementation)
|
|
|
|
Simplicity
|
|
Use the same encoding for nearly all fields.
|
|
Simple decoding, so slow CPUs (and embedded systems) can handle it.
|
|
|
|
Extensibility
|
|
No limit for the possible values of all fields (using universal vlc).
|
|
Allow adding of new headers in the future.
|
|
Allow adding more fields at the end of headers.
|
|
|
|
Compactness
|
|
~0.2% overhead for normal bitrates.
|
|
The index is <100kb per hour.
|
|
A typical file header is about 100 bytes (audio + video headers together).
|
|
A packet header is about ~1-5 bytes.
|
|
|
|
Error resistance
|
|
Seeking / playback is possible without an index.
|
|
Headers & index can be repeated.
|
|
Damaged files can be played back with minimal data loss and fast
|
|
resynchronization times.
|
|
|
|
The specification is frozen. All files following the specification will be
|
|
compatible unless the specification is unfrozen.
|
|
|
|
|
|
Definitions:
|
|
============
|
|
|
|
MUST The specific part must be done to conform to this standard.
|
|
SHOULD It is recommended to be done that way, but not strictly required.
|
|
|
|
keyframe
|
|
A keyframe is a frame from which you can start decoding, a more
|
|
exact definition is below
|
|
The nth frame is a keyframe if and only if frames n, n+1, ... in
|
|
presentation order (that are all frames with a pts >= frame[n].pts) can
|
|
be decoded successfully without reference to frames prior n in storage
|
|
order (that are all frames with a dts < frame[n].dts).
|
|
If no such frames exist (for example due to using overlapped transforms
|
|
like the MDCT in an audio codec), then the definition shall be extended
|
|
by dropping n out of the set of frames which must be decodable, if this
|
|
is still insufficient then n+1 shall be dropped, and so on until there is
|
|
a keyframe.
|
|
Every frame which is marked as a keyframe MUST be a keyframe according to
|
|
the definition above, a muxer MUST mark every frame it knows is a keyframe
|
|
as such, a muxer SHOULD NOT analyze future frames to determine the
|
|
keyframe status of the current frame but instead just set the frame as
|
|
non-keyframe.
|
|
(FIXME maybe move somewhere else?)
|
|
pts
|
|
Presentation time of the first frame/sample that is completed by decoding
|
|
the coded frame.
|
|
dts
|
|
The time when a frame is input into a synchronous 1-in-1-out decoder.
|
|
|
|
|
|
Syntax:
|
|
=======
|
|
|
|
Since NUT heavily uses variable length fields, the simplest way to describe it
|
|
is using a pseudocode approach.
|
|
|
|
|
|
|
|
Conventions:
|
|
============
|
|
|
|
The data types have a name, used in the bitstream syntax description, a short
|
|
text description and a pseudocode (functional) definition, optional notes may
|
|
follow:
|
|
|
|
name (text description)
|
|
functional definition
|
|
[Optional notes]
|
|
|
|
The bitstream syntax elements have a tagname and a functional definition, they
|
|
are presented in a bottom-up approach, again optional notes may follow and
|
|
are reproduced in the tag description:
|
|
|
|
name: (optional note)
|
|
functional definition
|
|
[Optional notes]
|
|
|
|
The in-depth tag description follows the bitstream syntax.
|
|
The functional definition has a C-like syntax.
|
|
|
|
|
|
|
|
Type definitions:
|
|
=================
|
|
|
|
f(n) (n fixed bits in big-endian order)
|
|
u(n) (unsigned number encoded in n bits in MSB-first order)
|
|
|
|
v (variable length value, unsigned)
|
|
value=0
|
|
do{
|
|
more_data u(1)
|
|
data u(7)
|
|
value= 128*value + data
|
|
}while(more_data)
|
|
|
|
s (variable length value, signed)
|
|
temp v
|
|
temp++
|
|
if(temp&1) value= -(temp>>1)
|
|
else value= (temp>>1)
|
|
|
|
b (binary data or string, to be use in vb, see below)
|
|
for(i=0; i<length; i++){
|
|
data[i] u(8)
|
|
}
|
|
[Note: strings MUST be encoded in UTF-8]
|
|
[Note: the character NUL (U+0000) is not legal within
|
|
or at the end of a string.]
|
|
|
|
vb (variable length binary data or string)
|
|
length v
|
|
value b
|
|
|
|
t (v coded universal timestamp)
|
|
tmp v
|
|
id= tmp % time_base_count
|
|
value= (tmp / time_base_count) * time_base[id]
|
|
|
|
|
|
Bitstream syntax:
|
|
=================
|
|
|
|
file:
|
|
file_id_string
|
|
while(!eof){
|
|
if(next_byte == 'N'){
|
|
packet_header
|
|
switch(startcode){
|
|
case main_startcode: main_header; break;
|
|
case stream_startcode:stream_header; break;
|
|
case info_startcode: info_packet; break;
|
|
case index_startcode: index; break;
|
|
case syncpoint_startcode: syncpoint; break;
|
|
}
|
|
packet_footer
|
|
}else
|
|
frame
|
|
}
|
|
|
|
The structure of an undamaged file should look like the following, but
|
|
demuxers should be flexible and be able to deal with damaged headers so the
|
|
above is a better loop in practice (not to mention it is simpler).
|
|
Note: Demuxers MUST be able to deal with new and unknown headers.
|
|
|
|
file:
|
|
file_id_string
|
|
while(!eof){
|
|
packet_header, main_header, packet_footer
|
|
reserved_headers
|
|
for(i=0; i<stream_count; i++){
|
|
packet_header, stream_header, packet_footer
|
|
reserved_headers
|
|
}
|
|
while(next_code == info_startcode){
|
|
packet_header, info_packet, packet_footer
|
|
reserved_headers
|
|
}
|
|
if(next_code == index_startcode){
|
|
packet_header, index_packet, packet_footer
|
|
}
|
|
if (!eof) while(next_code != main_startcode){
|
|
if(next_code == syncpoint_startcode){
|
|
packet_header, syncpoint, packet_footer
|
|
}
|
|
frame
|
|
reserved_headers
|
|
}
|
|
}
|
|
|
|
|
|
Common elements:
|
|
----------------
|
|
|
|
reserved_bytes:
|
|
for(i=0; i<forward_ptr - length_of_non_reserved; i++)
|
|
reserved u(8)
|
|
[A demuxer MUST ignore any reserved bytes.
|
|
A muxer MUST NOT write any reserved bytes, as this would make it
|
|
impossible to add new fields at the end of packets in the future
|
|
in a compatible way.]
|
|
|
|
packet_header
|
|
startcode f(64)
|
|
forward_ptr v
|
|
if(forward_ptr > 4096)
|
|
header_checksum u(32)
|
|
|
|
packet_footer
|
|
checksum u(32)
|
|
|
|
reserved_headers
|
|
while(next_byte == 'N' && next_code != main_startcode
|
|
&& next_code != stream_startcode
|
|
&& next_code != info_startcode
|
|
&& next_code != index_startcode
|
|
&& next_code != syncpoint_startcode){
|
|
packet_header
|
|
reserved_bytes
|
|
packet_footer
|
|
}
|
|
|
|
Headers:
|
|
|
|
main_header:
|
|
version v
|
|
stream_count v
|
|
max_distance v
|
|
time_base_count v
|
|
for(i=0; i<time_base_count; i++)
|
|
time_base_num v
|
|
time_base_denom v
|
|
time_base[i]= time_base_num/time_base_denom
|
|
tmp_pts=0
|
|
tmp_mul=1
|
|
tmp_stream=0
|
|
for(i=0; i<256; ){
|
|
tmp_flag v
|
|
tmp_fields v
|
|
if(tmp_fields>0) tmp_pts s
|
|
if(tmp_fields>1) tmp_mul v
|
|
if(tmp_fields>2) tmp_stream v
|
|
if(tmp_fields>3) tmp_size v
|
|
else tmp_size=0
|
|
if(tmp_fields>4) tmp_res v
|
|
else tmp_res=0
|
|
if(tmp_fields>5) count v
|
|
else count= tmp_mul - tmp_size
|
|
for(j=6; j<tmp_fields; j++){
|
|
tmp_reserved[i] v
|
|
}
|
|
for(j=0; j<count && i<256; j++, i++){
|
|
if (i == 'N') {
|
|
flags[i]= FLAG_INVALID;
|
|
j--;
|
|
continue;
|
|
}
|
|
flags[i]= tmp_flag;
|
|
stream_id[i]= tmp_stream;
|
|
data_size_mul[i]= tmp_mul;
|
|
data_size_lsb[i]= tmp_size + j;
|
|
pts_delta[i]= tmp_pts;
|
|
reserved_count[i]= tmp_res;
|
|
}
|
|
}
|
|
reserved_bytes
|
|
|
|
stream_header:
|
|
stream_id v
|
|
stream_class v
|
|
fourcc vb
|
|
time_base_id v
|
|
msb_pts_shift v
|
|
max_pts_distance v
|
|
decode_delay v
|
|
stream_flags v
|
|
codec_specific_data vb
|
|
if(stream_class == video){
|
|
width v
|
|
height v
|
|
sample_width v
|
|
sample_height v
|
|
colorspace_type v
|
|
}else if(stream_class == audio){
|
|
samplerate_num v
|
|
samplerate_denom v
|
|
channel_count v
|
|
}
|
|
reserved_bytes
|
|
|
|
Basic Packets:
|
|
|
|
frame:
|
|
frame_code f(8)
|
|
frame_flags= flags[frame_code]
|
|
frame_res= reserved_count[frame_code]
|
|
if(frame_flags&FLAG_CODED){
|
|
coded_flags v
|
|
frame_flags ^= coded_flags
|
|
}
|
|
if(frame_flags&FLAG_STREAM_ID){
|
|
stream_id v
|
|
}
|
|
if(frame_flags&FLAG_CODED_PTS){
|
|
coded_pts v
|
|
}
|
|
if(frame_flags&FLAG_SIZE_MSB){
|
|
data_size_msb v
|
|
}
|
|
if(frame_flags&FLAG_RESERVED)
|
|
frame_res v
|
|
for(i=0; i<frame_res; i++)
|
|
reserved v
|
|
if(frame_flags&FLAG_CHECKSUM){
|
|
checksum u(32)
|
|
}
|
|
data
|
|
|
|
index:
|
|
max_pts t
|
|
syncpoints v
|
|
for(i=0; i<syncpoints; i++){
|
|
syncpoint_pos_div16 v
|
|
}
|
|
for(i=0; i<stream_count; i++){
|
|
last_pts= -1
|
|
for(j=0; j<syncpoints; ){
|
|
x v
|
|
type= x & 1
|
|
x>>=1
|
|
n=j
|
|
if(type){
|
|
flag= x & 1
|
|
x>>=1
|
|
while(x--)
|
|
has_keyframe[n++][i]=flag
|
|
has_keyframe[n++][i]=!flag;
|
|
}else{
|
|
while(x != 1){
|
|
has_keyframe[n++][i]=x&1;
|
|
x>>=1;
|
|
}
|
|
}
|
|
for(; j<n && j<syncpoints; j++){
|
|
if (!has_keyframe[j][i]) continue
|
|
A v
|
|
if(!A){
|
|
A v
|
|
B v
|
|
eor_pts[j][i] = last_pts + A + B
|
|
}else
|
|
B=0
|
|
keyframe_pts[j][i] = last_pts + A
|
|
last_pts += A + B
|
|
}
|
|
}
|
|
}
|
|
reserved_bytes
|
|
index_ptr u(64)
|
|
|
|
info_packet:
|
|
stream_id_plus1 v
|
|
chapter_id s (Note: Due to a typo this was v
|
|
until 2006-11-04.)
|
|
chapter_start t
|
|
chapter_len v
|
|
count v
|
|
for(i=0; i<count; i++){
|
|
name vb
|
|
value s
|
|
if (value==-1){
|
|
type= "UTF-8"
|
|
value vb
|
|
}else if (value==-2){
|
|
type vb
|
|
value vb
|
|
}else if (value==-3){
|
|
type= "s"
|
|
value s
|
|
}else if (value==-4){
|
|
type= "t"
|
|
value t
|
|
}else if (value<-4){
|
|
type= "r"
|
|
value.den= -value-4
|
|
value.num s
|
|
}else{
|
|
type= "v"
|
|
}
|
|
}
|
|
reserved_bytes
|
|
|
|
syncpoint:
|
|
global_key_pts t
|
|
back_ptr_div16 v
|
|
reserved_bytes
|
|
|
|
Complete definition:
|
|
|
|
|
|
Tag description:
|
|
----------------
|
|
|
|
file_id_string
|
|
"nut/multimedia container\0"
|
|
The very first thing in every NUT file, useful for identifying NUT files.
|
|
|
|
*_startcode (f(64))
|
|
all startcodes start with 'N'
|
|
|
|
main_startcode (f(64))
|
|
0x7A561F5F04ADULL + (((uint64_t)('N'<<8) + 'M')<<48)
|
|
|
|
stream_startcode (f(64))
|
|
0x11405BF2F9DBULL + (((uint64_t)('N'<<8) + 'S')<<48)
|
|
|
|
syncpoint_startcode (f(64))
|
|
0xE4ADEECA4569ULL + (((uint64_t)('N'<<8) + 'K')<<48)
|
|
|
|
index_startcode (f(64))
|
|
0xDD672F23E64EULL + (((uint64_t)('N'<<8) + 'X')<<48)
|
|
|
|
info_startcode (f(64))
|
|
0xAB68B596BA78ULL + (((uint64_t)('N'<<8) + 'I')<<48)
|
|
|
|
version (v)
|
|
NUT version. The current value is 3. All lower values are pre-freeze.
|
|
|
|
stream_count (v)
|
|
number of streams in this file
|
|
|
|
time_base_count (v)
|
|
number of different time bases in this file
|
|
This MUST NOT be 0.
|
|
|
|
forward_ptr (v)
|
|
Size of the packet data (exactly the distance from the first byte
|
|
after the packet_header to the first byte of the next packet).
|
|
Every NUT packet contains a forward_ptr immediately after its startcode
|
|
with the exception of frame_code-based packets. The forward pointer
|
|
can be used to skip over the packet without decoding its contents.
|
|
|
|
max_distance (v)
|
|
maximum distance between startcodes. If p1 and p2 are the byte
|
|
positions of the first byte of two consecutive startcodes, then
|
|
p2-p1 MUST be less than or equal to max_distance unless the entire
|
|
span from p1 to p2 comprises a single packet or a syncpoint
|
|
followed by a single frame. This imposition places efficient upper
|
|
bounds on seek operations and allows for the detection of damaged
|
|
frame headers, should a chain of frame headers pass max_distance
|
|
without encountering any startcode.
|
|
|
|
Syncpoints SHOULD be placed immediately before a keyframe if the
|
|
previous frame of the same stream was a non-keyframe, unless such
|
|
non-keyframe - keyframe transitions are very frequent.
|
|
|
|
SHOULD be set to <=32768.
|
|
If the stored value is >65536 then max_distance MUST be set to 65536.
|
|
|
|
This is also half the maximum frame size without a checksum after the
|
|
frame header.
|
|
|
|
|
|
max_pts_distance (v)
|
|
Maximum absolute difference of the pts of the new frame from last_pts in
|
|
the timebase of the stream, without a checksum after the frame header.
|
|
A frame header MUST include a checksum if abs(pts-last_pts) is
|
|
strictly greater than max_pts_distance.
|
|
Note that last_pts is not necessarily the pts of the last frame
|
|
on the same stream, as it is altered by syncpoint timestamps.
|
|
SHOULD NOT be higher than 1/timebase.
|
|
|
|
stream_id (v)
|
|
Stream identifier
|
|
stream_id MUST be < stream_count
|
|
|
|
stream_class (v)
|
|
0 video
|
|
1 audio
|
|
2 subtitles
|
|
3 userdata
|
|
Note: The remaining values are reserved and MUST NOT be used.
|
|
A demuxer MUST ignore streams with reserved classes.
|
|
|
|
fourcc (vb)
|
|
identification for the codec
|
|
example: "H264"
|
|
MUST contain 2 or 4 bytes, note, this might be increased in the future
|
|
if needed.
|
|
The ID values used are the same as in AVI, so if a codec uses a specific
|
|
FourCC in AVI then the same FourCC MUST be used here.
|
|
|
|
time_base_num (v) / time_base_denom (v) = time_base
|
|
the length of a timer tick in seconds, this MUST be equal to the 1/fps
|
|
if FLAG_FIXED_FPS is set
|
|
time_base_num and time_base_denom MUST NOT be 0
|
|
time_base_num and time_base_denom MUST be relatively prime
|
|
time_base_denom MUST be < 2^31
|
|
examples:
|
|
fps time_base_num time_base_denom
|
|
30 1 30
|
|
29.97 1001 30000
|
|
23.976 1001 24000
|
|
There MUST NOT be 2 identical timebases in a file.
|
|
There SHOULD NOT be more timebases than streams.
|
|
|
|
time_base_id (v)
|
|
index into the time_base table
|
|
MUST be < time_base_count.
|
|
|
|
convert_ts
|
|
To switch from 2 different timebases, the following calculation is
|
|
defined:
|
|
|
|
ln = from_time_base_num*to_time_base_denom
|
|
sn = from_timestamp
|
|
d1 = from_time_base_denom
|
|
d2 = to_time_base_num
|
|
timestamp = (ln/d1*sn + ln%d1*sn/d1)/d2
|
|
Note: This calculation MUST be done with unsigned 64 bit integers, and
|
|
is equivalent to (ln*sn)/(d1*d2) but this would require a 96 bit integer.
|
|
|
|
compare_ts
|
|
Compares timestamps from 2 different timebases,
|
|
if a is before b then compare_ts(a, b) = -1
|
|
if a is after b then compare_ts(a, b) = 1
|
|
else compare_ts(a, b) = 0
|
|
|
|
Care must be taken that this is done exactly with no rounding errors,
|
|
simply casting to float or double and doing the obvious
|
|
a*timebase > b*timebase is not compliant or correct, neither is the
|
|
same with integers, and
|
|
a*a_timebase.num*b_timebase.den > b*b_timebase.num*a_timebase.den
|
|
will overflow. One possible implementation which shouldn't overflow
|
|
within the range of legal timestamps and timebases is:
|
|
|
|
if (convert_ts(a, a_timebase, b_timebase) < b) return -1;
|
|
if (convert_ts(b, b_timebase, a_timebase) < a) return 1;
|
|
return 0;
|
|
|
|
msb_pts_shift (v)
|
|
amount of bits in lsb_pts
|
|
MUST be <16.
|
|
|
|
decode_delay (v)
|
|
Size of the reordering buffer used to convert pts to dts.
|
|
Codecs which do not support B-frames normally use 0.
|
|
MPEG-1/MPEG-2-style codecs with B-frames use 1.
|
|
H.264-style B-pyramid uses 2.
|
|
H.264 and future codecs might need values >2.
|
|
Audio codecs generally use 0. (We are not aware of any, but it
|
|
is theoretically possible that a codec might need a value >0.)
|
|
decode_delay MUST NOT be set higher than necessary for a codec.
|
|
|
|
stream_flags (v)
|
|
Bit Name Description
|
|
1 FLAG_FIXED_FPS indicates that the fps is fixed
|
|
|
|
codec_specific_data (vb)
|
|
Private global data for a codec (could be huffman tables or ...).
|
|
If a codec has a global header it SHOULD be placed in here instead of
|
|
at the start of every keyframe.
|
|
The exact format is specified in the codec specification.
|
|
For H.264 the NAL units MUST be formatted as in a bytestream
|
|
(with 00 00 01 prefixes).
|
|
codec_specific_data SHOULD contain exactly the essential global packets
|
|
needed to decode a stream, more specifically it SHOULD NOT contain packets
|
|
which contain only non essential metadata like author, title, ...
|
|
It also MUST NOT contain normal packets which cause the reference decoder
|
|
to generate any specific decoded samples.
|
|
The encoder name and version shall be considered essential as it is very
|
|
useful to work around possible encoder bugs.
|
|
The global headers MUST consist of the normal
|
|
sequence of header packets required for codec initialization, in the
|
|
order defined in the codec spec. An implementation MAY strip metadata and
|
|
other redundant information not necessary for correct playback from the
|
|
global headers as long as no incorrect values are stored and as long as
|
|
the stripped result is not less valid per codec spec as before stripping.
|
|
|
|
frame_code (f(8))
|
|
frame_code is an 8-bit field which exists before every frame, it can
|
|
store part of the size of the frame, the stream number, the timestamp
|
|
and some flags amongst other things. What is not directly stored
|
|
in it but is needed is stored in various fields immediately after it.
|
|
The values stored in it can be found in the main header.
|
|
The value 78 ('N') is forbidden to ensure that the byte is always
|
|
different from the first byte of any startcode.
|
|
A muxer SHOULD mark 0x00 and 0xFF as invalid to improve error
|
|
detection.
|
|
|
|
flags[frame_code], frame_flags (v)
|
|
Bit Name Description
|
|
0 FLAG_KEY If set, the frame is a keyframe.
|
|
1 FLAG_EOR If set, the stream has no relevance on
|
|
presentation. (EOR)
|
|
3 FLAG_CODED_PTS If set, coded_pts is in the frame header.
|
|
4 FLAG_STREAM_ID If set, stream_id is coded in the frame header.
|
|
5 FLAG_SIZE_MSB If set, data_size_msb at the frame header,
|
|
otherwise data_size_msb is 0.
|
|
6 FLAG_CHECKSUM If set, the frame header contains a checksum.
|
|
7 FLAG_RESERVED If set, reserved_count is coded in the frame header.
|
|
12 FLAG_CODED If set, coded_flags are stored in the frame header.
|
|
13 FLAG_INVALID If set, frame_code is invalid.
|
|
|
|
EOR frames MUST be zero-length and must be set keyframe.
|
|
All streams SHOULD end with EOR, where the pts of the EOR indicates the
|
|
end presentation time of the final frame.
|
|
An EOR set stream is unset by the first content frames.
|
|
EOR can only be unset in streams with zero decode_delay .
|
|
FLAG_CHECKSUM MUST be set if the frame's data_size is strictly greater than
|
|
2*max_distance or the difference abs(pts-last_pts) is strictly greater than
|
|
max_pts_distance (where pts represents this frame's pts and last_pts is
|
|
defined as below).
|
|
|
|
last_pts
|
|
The timestamp of the last frame with the same stream_id as the current.
|
|
If there is no such frame between the last syncpoint and the current
|
|
frame then the syncpoint timestamp is used, see global_key_pts.
|
|
|
|
stream_id[frame_code] (v)
|
|
If FLAG_STREAM_ID is not set then this is the stream number for the
|
|
frame following this frame_code.
|
|
If FLAG_STREAM_ID is set then this value has no meaning.
|
|
MUST be <250.
|
|
|
|
data_size_mul[frame_code] (v)
|
|
If FLAG_SIZE_MSB is set then data_size_msb which is stored after the
|
|
frame code is multiplied with it and forms the more significant part
|
|
of the size of the following frame.
|
|
If FLAG_SIZE_MSB is not set then this field has no meaning.
|
|
MUST be <16384.
|
|
|
|
data_size_lsb[frame_code] (v)
|
|
The less significant part of the size of the following frame.
|
|
This added together with data_size_mul*data_size_msb is the size of
|
|
the following frame.
|
|
MUST be <16384.
|
|
|
|
pts_delta[frame_code] (s)
|
|
If FLAG_CODED_PTS is set in the flags of the current frame then this
|
|
value MUST be ignored, if FLAG_CODED_PTS is not set then pts_delta is the
|
|
difference between the current pts and last_pts.
|
|
MUST be <16384 and >-16384.
|
|
|
|
reserved_count[frame_code] (v)
|
|
MUST be <256.
|
|
|
|
data_size
|
|
The size of the following frame.
|
|
data_size = data_size_lsb + data_size_msb * data_size_mul ;
|
|
|
|
coded_pts (v)
|
|
If coded_pts < ( 1 << msb_pts_shift ) then it is an lsb
|
|
pts, otherwise it is a full pts + ( 1 << msb_pts_shift ).
|
|
lsb pts is converted to a full pts by:
|
|
mask = ( 1 << msb_pts_shift ) - 1;
|
|
delta = last_pts - mask / 2
|
|
pts = ( (pts_lsb - delta) & mask ) + delta
|
|
|
|
lsb_pts
|
|
Least significant bits of the pts in time_base precision.
|
|
Example: IBBP display order
|
|
keyframe pts=0 -> pts=0
|
|
frame lsb_pts=3 -> pts=3
|
|
frame lsb_pts=1 -> pts=1
|
|
frame lsb_pts=2 -> pts=2
|
|
...
|
|
keyframe msb_pts=257 -> pts=257
|
|
frame lsb_pts=255 -> pts=255
|
|
frame lsb_pts=0 -> pts=256
|
|
frame lsb_pts=4 -> pts=260
|
|
frame lsb_pts=2 -> pts=258
|
|
frame lsb_pts=3 -> pts=259
|
|
All pts values of keyframes of a single stream MUST be monotone.
|
|
|
|
dts
|
|
decoding timestamp
|
|
The dts of a frame is the timestamp of the first sample which is
|
|
output by a decoder when it is fed with the frame. Note that the
|
|
data output is not necessarily what is coded in the frame, but may
|
|
be data from previous frames.
|
|
dts is calculated by using a decode_delay + 1 sized buffer for each
|
|
stream, into which the current pts is inserted and the element with
|
|
the smallest value is removed. This is then the current dts.
|
|
This buffer is initialized with decode_delay - 1 elements.
|
|
|
|
pts of all frames in all streams MUST be bigger or equal to dts of all
|
|
previous frames in all streams, compared in common timebase. (EOR
|
|
frames are NOT exempt from this rule.)
|
|
dts of all frames MUST be bigger or equal to dts of all previous frames
|
|
in the same stream.
|
|
|
|
width (v) / height (v)
|
|
Width and height of the video in pixels.
|
|
MUST be set to the coded width/height, MUST NOT be 0.
|
|
|
|
sample_width (v) /sample_height (v) (aspect ratio)
|
|
sample_width is the horizontal distance between samples.
|
|
sample_width and sample_height MUST be relatively prime if not zero.
|
|
Both MUST be 0 if unknown otherwise both MUST be nonzero.
|
|
|
|
colorspace_type (v)
|
|
0 unknown
|
|
1 ITU Rec 624 / ITU Rec 601 Y range: 16..235 Cb/Cr range: 16..240
|
|
2 ITU Rec 709 Y range: 16..235 Cb/Cr range: 16..240
|
|
17 ITU Rec 624 / ITU Rec 601 Y range: 0..255 Cb/Cr range: 0..255
|
|
18 ITU Rec 709 Y range: 0..255 Cb/Cr range: 0..255
|
|
|
|
samplerate_num (v) / samplerate_denom (v) = samplerate
|
|
The number of samples per second, MUST NOT be 0.
|
|
|
|
crc32 checksum
|
|
Generator polynomial is 0x104C11DB7. Starting value is zero.
|
|
|
|
checksum (u(32))
|
|
crc32 checksum
|
|
The checksum is calculated for the area pointed to by forward_ptr
|
|
not including the checksum itself (from first byte after the
|
|
packet_header until last byte before the checksum).
|
|
For frame headers the checksum contains the framecode byte and all
|
|
following bytes up to the checksum itself.
|
|
|
|
header_checksum (u(32))
|
|
Checksum over the startcode and forward pointer.
|
|
|
|
Syncpoint tags:
|
|
---------------
|
|
|
|
back_ptr_div16 (v)
|
|
back_ptr = back_ptr_div16 * 16 + 15
|
|
back_ptr must point to a position up to 15 bytes before a syncpoint
|
|
startcode, relative to position of current syncpoint. The syncpoint
|
|
pointed to MUST be the closest syncpoint such that at least one keyframe
|
|
with a pts lower or equal to the current syncpoint's global_key_pts for
|
|
all streams lies between it and the current syncpoint.
|
|
|
|
A stream where EOR is set is to be ignored for back_ptr.
|
|
|
|
global_key_pts (t)
|
|
After a syncpoint, last_pts of each stream is to be set to:
|
|
last_pts[i] = convert_ts(global_key_pts, time_base[id], time_base[i])
|
|
|
|
global_key_pts MUST be bigger or equal to dts of all past frames across
|
|
all streams, and smaller or equal to pts of all future frames.
|
|
|
|
Index tags:
|
|
-----------
|
|
|
|
max_pts (t)
|
|
the highest pts in the entire file
|
|
|
|
syncpoints (v)
|
|
number of indexed syncpoints
|
|
|
|
syncpoint_pos_div16 (v)
|
|
The offset from the beginning of the file to up to 15 bytes before the
|
|
syncpoint referred to in this index entry. Relative to position of last
|
|
syncpoint.
|
|
|
|
has_keyframe
|
|
Indicates whether this stream has a keyframe between this syncpoint and
|
|
the last syncpoint.
|
|
|
|
keyframe_pts
|
|
The pts of the first keyframe for this stream in the region between the
|
|
2 syncpoints, in the stream's timebase. (EOR frames are also keyframes.)
|
|
|
|
eor_pts
|
|
Coded only if EOR is set at the position of the syncpoint. The pts of
|
|
that EOR. EOR is unset by the first keyframe after it.
|
|
|
|
index_ptr (u(64))
|
|
Length in bytes of the entire index, from the first byte of the
|
|
startcode until the last byte of the checksum.
|
|
Note: A demuxer can use this to find the index when it is written at
|
|
EOF, as index_ptr will always be 12 bytes before the end of file if
|
|
there is an index at all.
|
|
|
|
|
|
Info tags:
|
|
----------
|
|
|
|
stream_id_plus1 (v)
|
|
Stream this info packet applies to. If zero, packet applies to the
|
|
whole file.
|
|
|
|
chapter_id (s)
|
|
The ID of the chapter this packet applies to. If zero, the packet applies
|
|
to the whole file. Positive chapter_id values represent real chapters and
|
|
MUST NOT overlap.
|
|
A negative chapter_id indicates a sub region of the file and not a real
|
|
chapter. chapter_id MUST be unique to the region it represents.
|
|
chapter_id n MUST NOT be used unless there are at least n chapters in the
|
|
file.
|
|
|
|
chapter_start (t)
|
|
timestamp of start of chapter
|
|
|
|
chapter_len (v)
|
|
Length of chapter in the same timebase as chapter_start.
|
|
|
|
count (v)
|
|
number of name/value pairs in this info packet
|
|
|
|
type
|
|
for example: "UTF8" -> string or "JPEG" -> JPEG image
|
|
"v" -> unsigned integer
|
|
"s" -> signed integer
|
|
"r" -> rational
|
|
Note: Nonstandard fields should be prefixed by "X-".
|
|
Note: MUST be less than 6 byte long (might be increased to 64 later).
|
|
|
|
info packet types
|
|
The name of the info entry. Valid names are
|
|
"Author"
|
|
"Description"
|
|
"Copyright"
|
|
"Encoder"
|
|
The name & version of the software used for encoding.
|
|
"Title"
|
|
"Cover" (allowed types are "PNG" and "JPEG")
|
|
image of the (CD, DVD, VHS, ..) cover (preferably PNG or JPEG)
|
|
"Source"
|
|
"DVD", "VCD", "CD", "MD", "FM radio", "VHS", "TV", "LD"
|
|
Optional: Appended PAL, NTSC, SECAM, ... in parentheses.
|
|
"SourceContainer"
|
|
"nut", "mkv", "mov", "avi", "ogg", "rm", "mpeg-ps", "mpeg-ts", "raw"
|
|
"SourceCodecTag"
|
|
The source codec ID like a FourCC which was used to store a specific
|
|
stream in its SourceContainer.
|
|
"CaptureDevice"
|
|
"BT878", "BT848", "webcam", ... (or more precise names)
|
|
"CreationTime"
|
|
"2003-01-20 20:13:15Z", ...
|
|
(ISO 8601 format, see http://www.cl.cam.ac.uk/~mgk25/iso-time.html)
|
|
Note: Do not forget the timezone.
|
|
"Keywords"
|
|
"Language"
|
|
An ISO 639-2 (three-letter) language code, optionally followed by an
|
|
ISO 3166-1 country code that is separated from the language
|
|
code by a hyphen. All codes defined in ISO 639-2 are allowed,
|
|
including "und" (Undetermined), "mul" (Multiple languages).
|
|
See http://www.loc.gov/standards/iso639-2/
|
|
and http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1/en_listp1.html
|
|
the language code
|
|
A demuxer MUST ignore unknown language and country codes instead of
|
|
treating them as an error.
|
|
"Disposition"
|
|
"original", "dub" (translated), "comment", "lyrics", "karaoke"
|
|
Note: If someone needs some others, please tell us about them, so we
|
|
can add them to the official standard (if they are sane).
|
|
Note: Nonstandard fields should be prefixed by "X-".
|
|
Note: Names of fields SHOULD be in English if a word with the same
|
|
meaning exists in English.
|
|
Note: MUST be less than 64 bytes long.
|
|
|
|
value
|
|
value of this name/type pair
|
|
|
|
stuffing
|
|
0x80 can be placed in front of any type v entry for stuffing purposes.
|
|
Exceptions are the forward_ptr and all fields in the frame header where
|
|
a maximum of 8 stuffing bytes per field are allowed.
|
|
|
|
|
|
Structure:
|
|
----------
|
|
|
|
The headers MUST be in exactly the following order (to simplify demuxer design).
|
|
|
|
main header
|
|
stream_header (id=0)
|
|
stream_header (id=1)
|
|
...
|
|
stream_header (id=n)
|
|
|
|
Headers may be repeated, but if they are, then they MUST all be repeated
|
|
together and repeated headers MUST be identical.
|
|
|
|
Each set of repeated headers not at the beginning or end of the file SHOULD
|
|
be stored at the earliest possible position after 2^x where x is an integer
|
|
and the end of the file. So the headers may be repeated at 4102 if that is
|
|
the closest position after 2^12=4096 at which the headers can be placed.
|
|
|
|
Note: This allows an implementation reading the file to locate backup
|
|
headers in O(log filesize) time as opposed to O(filesize).
|
|
|
|
Headers MUST be placed at least at the start of the file and immediately before
|
|
the index or at the end of the file if there is no index.
|
|
Headers MUST be repeated at least twice (so they exist three times in a file).
|
|
|
|
There MUST be a syncpoint immediately before the first frame after any headers.
|
|
|
|
|
|
Index:
|
|
------
|
|
|
|
Note: With realtime streaming, there is no end, so no index there either.
|
|
Index MAY only be repeated after main headers.
|
|
If an index is written anywhere in the file, it MUST be written at end of
|
|
file as well.
|
|
|
|
|
|
Info:
|
|
-----
|
|
|
|
If an info packet is stored anywhere then a muxer MUST also store an identical
|
|
info packet after every main-stream-header set.
|
|
|
|
If a demuxer has seen several info packets with the same chapter_id and
|
|
stream_id then it MUST ignore all but the one with the highest position in
|
|
the file.
|
|
|
|
Demuxers SHOULD NOT search the whole file for info packets.
|
|
|
|
demuxer (non-normative):
|
|
------------------------
|
|
|
|
In the absence of a valid header at the beginning, players SHOULD search for
|
|
backup headers starting at offset 2^x; for each x players SHOULD end their
|
|
search at a particular offset when any startcode (including a syncpoint) is
|
|
found.
|
|
|
|
|
|
Seeking without an index (non-normative):
|
|
-----------------------------------------
|
|
A. backward seeking
|
|
1. Perform a binary search on the syncpoint timestamps finding the one
|
|
which is largest and <= the target timestamp.
|
|
B. forward seeking
|
|
1a. Perform a binary search on the syncpoint timestamps finding the one
|
|
which is smallest and >= the target timestamp.
|
|
1b. Perform a binary search on the syncpoint back pointers finding the
|
|
smallest one which has a back ptr >= the position of what was found in 1.
|
|
2. Follow the back pointer to the corresponding syncpoint.
|
|
|
|
Seeking with an index (non-normative):
|
|
--------------------------------------
|
|
The demuxer only has to find the appropriate keyframe in the index and
|
|
start demuxing from the previous syncpoint.
|
|
|
|
Note, more complicated seeking methods exist which are capable of quickly
|
|
seeking to the optimal point in the presence of an index even if only a
|
|
subset of all streams is active.
|
|
|
|
A muxer SHOULD place syncpoints so that that simple low complexity seeking
|
|
works with fine granularity. That is, syncpoints should be placed prior
|
|
to keyframes instead of non-keyframes and with high enough frequency
|
|
(once per second unless there are no keyframes between this and the previous
|
|
syncpoint).
|
|
|
|
Encoders SHOULD place keyframes so that the number of points where all
|
|
streams have a keyframe at the same time is maximized. This ensures that
|
|
seeking (complicated or not) does not need to demux and decode significant
|
|
amounts of data to reach a point where a presentable frame for each stream
|
|
is available after seeking.
|
|
|
|
|
|
Semantic requirements:
|
|
======================
|
|
|
|
If more than one stream of a given stream class is present, each one SHOULD
|
|
have info tags specifying disposition, and if applicable, language.
|
|
It often highly improves usability and is therefore strongly encouraged.
|
|
|
|
A demuxer MUST NOT demux a stream which contains more than one stream, or which
|
|
is wrapped in a structure to facilitate more than one stream or otherwise
|
|
duplicate the role of a container. Any such file is to be considered invalid.
|
|
For example Vorbis in Ogg in NUT is invalid, as is
|
|
mpegvideo + mpegaudio in MPEG-PS/TS in NUT or dvvideo + dvaudio in DV in NUT.
|
|
|
|
|
|
|
|
Sample code (Public Domain, & untested):
|
|
========================================
|
|
|
|
typedef BufferContext{
|
|
uint8_t *buf;
|
|
uint8_t *buf_ptr;
|
|
}BufferContext;
|
|
|
|
static inline uint64_t get_bytes(BufferContext *bc, int count){
|
|
uint64_t val=0;
|
|
|
|
assert(count>0 && count<9);
|
|
|
|
for(i=0; i<count; i++){
|
|
val <<=8;
|
|
val += *(bc->buf_ptr++);
|
|
}
|
|
|
|
return val;
|
|
}
|
|
|
|
static inline void put_bytes(BufferContext *bc, int count, uint64_t val){
|
|
uint64_t val=0;
|
|
|
|
assert(count>0 && count<9);
|
|
|
|
for(i=count-1; i>=0; i--){
|
|
*(bc->buf_ptr++)= val >> (8*i);
|
|
}
|
|
|
|
return val;
|
|
}
|
|
|
|
static inline uint64_t get_v(BufferContext *bc){
|
|
uint64_t val= 0;
|
|
|
|
for(; space_left(bc) > 0; ){
|
|
int tmp= *(bc->buf_ptr++);
|
|
if(tmp&0x80)
|
|
val= (val<<7) + tmp - 0x80;
|
|
else
|
|
return (val<<7) + tmp;
|
|
}
|
|
|
|
return -1;
|
|
}
|
|
|
|
static inline int put_v(BufferContext *bc, uint64_t val){
|
|
int i;
|
|
|
|
if(space_left(bc) < 9) return -1;
|
|
|
|
val &= 0x7FFFFFFFFFFFFFFFULL; // FIXME: Can only encode up to 63 bits ATM.
|
|
for(i=7; ; i+=7){
|
|
if(val>>i == 0) break;
|
|
}
|
|
|
|
for(i-=7; i>0; i-=7){
|
|
*(bc->buf_ptr++)= 0x80 | (val>>i);
|
|
}
|
|
*(bc->buf_ptr++)= val&0x7F;
|
|
|
|
return 0;
|
|
}
|
|
|
|
static int64_t get_dts(int64_t pts, int64_t *pts_cache, int delay, int reset){
|
|
if(reset) memset(pts_cache, -1, delay*sizeof(int64_t));
|
|
|
|
while(delay--){
|
|
int64_t t= pts_cache[delay];
|
|
if(t < pts){
|
|
pts_cache[delay]= pts;
|
|
pts= t;
|
|
}
|
|
}
|
|
|
|
return pts;
|
|
}
|
|
|
|
|
|
|
|
Authors:
|
|
========
|
|
|
|
Folks from the MPlayer developers mailing list (http://www.mplayerhq.hu/).
|
|
Authors in alphabetical order: (FIXME! Tell us if we left you out)
|
|
Beregszaszi, Alex (alex@fsn.hu)
|
|
Bunkus, Moritz (moritz@bunkus.org)
|
|
Diedrich, Tobias (ranma+mplayer@tdiedrich.de)
|
|
Felker, Rich (dalias@aerifal.cx)
|
|
Franz, Fabian (FabianFranz@gmx.de)
|
|
Gereoffy, Arpad (arpi@thot.banki.hu)
|
|
Hess, Andreas (jaska@gmx.net)
|
|
Niedermayer, Michael (michaelni@gmx.at)
|
|
Shimon, Oded (ods15@ods15.dyndns.org)
|