Optimize put_hevc_qpel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11.4%(2.01x to 2.24x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>