~4 cpu cylces faster for the matrixbench video Originally committed as revision 9856 to svn://svn.ffmpeg.org/ffmpeg/trunk