Fixes compilation on 32-bit llvm which didn't allow a cast in an m operand.
Originally committed as revision 19086 to svn://svn.ffmpeg.org/ffmpeg/trunk
12.59% overall speedup in x86_32
9.98% overall speedup in x86_64
compared to gcc 4.3.3
Originally committed as revision 18903 to svn://svn.ffmpeg.org/ffmpeg/trunk