Prior to ARMv6, the destination registers of the SMULL instruction
must be distinct from the first source register. Marking the
output early-clobber ensures it is allocated unique registers.
This restriction is dropped in ARMv6 and later, so allowing overlap
between input and output registers there might give better code.
Signed-off-by: Mans Rullgard <mans@mansr.com>