here follows my patch. Please excuse the long delay.
The original code uses explicitly programmed loop unrolling. I did not do that
because I assume that the compiler can do this much more efficiently.
> >This is what I did. The luminance channel of the matte is used in a
> > convex combination of the input sources thus using the [16,235]/219
> > range.
>
> Well, post your patch, already. You fixed a bug! :)