takes the most time, and it allows for more efficient unaligned access
and better control over memory latencies.
Originally committed as revision 711 to svn://svn.ffmpeg.org/ffmpeg/trunk
allows better scheduling of the memory accesses, and is portable among
all compilers.
Originally committed as revision 709 to svn://svn.ffmpeg.org/ffmpeg/trunk