Essentially the same as d1d2370d07 except
for clear_rgba_overlay. From some empirical testing, the s->x1 value is
either 0 or SLICE_W (256). In the case where it is 256 and s->x0 is 0,
then it writes memory. For most slices, this is fine. However for the
very last slice in the loop, it is possible for the width here to exceed
the total width of the line (p->w). If that occurs, the memset triggers
a buffer overflow. This last case needs to be guarded in the same way as
the previously mentioned commit (total width - SLICE_W * x) and with
MPMIN in case s->x1 is 0 here to preserve the old behavior. It's unknown
if anything other than dmabuf-wayland can possibly hit this, but it's
definitely a problem within mp_draw_sub_overlay itself.