From: Brian Paul <brianp@va...>  20001019 13:57:54

Keith Packard wrote: > > unsigned char src = ...; > unsigned char srcFactor = ...; > unsigned char dst = ...; > unsigned char dstFactor = ...; > unsigned int pix = src * srcFactor + dst * dstFactor; > unsigned char result = (pix + (pix >> 8)) >> 8; > > I don't think this is quite soup yet, here's some more ideas  from code > stolen from Jim Blinn: > > /* multiply two bytes representing 0..1 yielding a byte representing 0..1 */ > #define IntMult(a,b,t) ( (t) = (a) * (b) + 0x80, ( ( ( (t)>>8 ) + (t) )>>8 ) ) > > unsigned short t; > unsigned short pix = IntMult(src,srcFactor,t) + IntMult(dst,dstFactor,t); > unsigned char result = (pix  (0  (pix >> 8))); > > The final bit of twiddling saturates the result at 0xff with no compare. > > I suspect the two IntMults could be merged to avoid all of the extra shifts, > but that would take actually compiling and testing the code... The actual computation I'm doing for the common case of glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) is result = (source * alpha + dest * (255  alpha)) / 255; for each color channel. All vars are in the range [0,255]. With a little tinkering, I found that the following macro exactly computes X / 255 for X in [0, 65535]: #define DIV255(X) (((X) * ((1 << 16) / 255) + 256) >> 16) The macro's cost is an integer multiply, add and shift. The total cost for one color channel is 3 mults, 3 adds/subs and 1 shift. Blinn's method uses 2 mults, 5 adds/subs, 5 shifts and a temp var. I haven't done any benchmarking. Brian 