The loops were reading ahead one line, which could end up outside the
buffer for reference blocks at the edge of the picture. Removing
this readahead has no measurable performance impact.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The standard syntax requires two destination registers for
LDRD/STRD instructions. Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>