We generally want 2 things:
1. minimal wakeups for decoding each frame
2. minimal number of frames decoded on continuous seeking
Commit 35810cb8 changed this a bit, and fixed 1. But it broke 2., and
now it decodes 2 frames instead of 1 when you keep seeking (arrow key
held down or such). This made seeking appear slower.
Fix this by making the logic more explicit. In particular, call the
filters only if we actually try to get a new frame.
When playing with --no-audio and all other distractions disabled (like
OSC), it still wakes up 2 times per frame - but the second time is
merely because the VO didn't accept the new frame yet.