Copy SSE4.1 implementation from https://github.com/BLAKE2/BLAKE2 . Signed-off-by: David Sterba <dsterba@suse.com>