Re: [Dri-devel] tdfx and Mesa blending bug fixes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Keith Packard wrote:
> 
>         unsigned char src = ...;
>         unsigned char srcFactor = ...;
>         unsigned char dst = ...;
>         unsigned char dstFactor = ...;
>         unsigned int pix = src * srcFactor + dst * dstFactor;
>         unsigned char result = (pix + (pix >> 8)) >> 8;
> 
> I don't think this is quite soup yet, here's some more ideas -- from code
> stolen from Jim Blinn:
> 
> /* multiply two bytes representing 0..1 yielding a byte representing 0..1 */
> #define IntMult(a,b,t) ( (t) = (a) * (b) + 0x80, ( ( ( (t)>>8 ) + (t) )>>8 ) )
> 
> unsigned short t;
> unsigned short pix = IntMult(src,srcFactor,t) + IntMult(dst,dstFactor,t);
> unsigned char result = (pix | (0 - (pix >> 8)));
> 
> The final bit of twiddling saturates the result at 0xff with no compare.
> 
> I suspect the two IntMults could be merged to avoid all of the extra shifts,
> but that would take actually compiling and testing the code...

The actual computation I'm doing for the common case of
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) is

result = (source * alpha + dest * (255 - alpha)) / 255;

for each color channel.  All vars are in the range [0,255].

With a little tinkering, I found that the following macro exactly
computes X / 255 for X in [0, 65535]:

#define DIV255(X)  (((X) * ((1 << 16) / 255) + 256) >> 16)

The macro's cost is an integer multiply, add and shift.
The total cost for one color channel is 3 mults, 3 adds/subs and 1 shift.

Blinn's method uses 2 mults, 5 adds/subs, 5 shifts and a temp var.

I haven't done any benchmarking.

-Brian