I've been sitting on this for 3 1/2 years now(!), and I finally got
around to fixing the loose ends and convincing myself that it was
correct. It follows the same basic structure as yadif_cuda, including
leaving out the edge handling, to avoid expensive branching.