Re: [Dri-devel] tdfx and Mesa blending bug fixes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thu, Oct 19, 2000 at 09:14:34AM -0600, Brian Paul wrote:
> Nathan Hand wrote:
> > 
> > On Thu, Oct 19, 2000 at 07:04:12AM -0600, Brian Paul wrote:
> > >
> > > #define DIV255(X)  (((X) * ((1 << 16) / 255) + 256) >> 16)
> > 
> > When this code is passed through gcc -O2 the assembly produced is
> > 
> >     movl %edx,%eax                 # %eax = x
> >     sall $8,%eax                   # %eax = x << 8
> >     leal 256(%edx,%eax),%eax       # %eax = (x << 8) + x + 256
> >     sarl $16,%eax                  # %eax = ((x << 8) + x + 256) >> 16
> 
> Huh?  I think you're looking at the x86 code for your method,
> not my DIV255 macro.

Both my method and your DIV255 method produce the same assembly if
compiled with -O2. You can verify this yourself with objdump.

> > So gcc converts DIV255 into the method I posted earlier.
> > 
> > > The macro's cost is an integer multiply, add and shift.
> > > The total cost for one color channel is 3 mults, 3 adds/subs and 1 shift.
> > 
> > After -O2 the cost is 2 shifts and 2 adds.
> > 
> > > Blinn's method uses 2 mults, 5 adds/subs, 5 shifts and a temp var.
> > >
> > > I haven't done any benchmarking.
> > 
> > I've attached a simple benchmark. Compile with -O2. Results are
> > 
> >    DIV255 5639408
> >    approx 5629603
> >    exact  5629793
> > 
> > Smaller is better. Numbers represent microseconds elapsed.
> > 
> > DIV255 without optimisation is 1 multiply, 1 add, 1 shift.
> > 
> >    (((x) * ((1 << 16) / 255) + 256) >> 16)
> > 
> > The approximate version isn't a significant win.
> > 
> >    ((x << 8) + x) >> 16
> > 
> > The exact version is the same as DIV255 after -O2, but is explicit.
> > 
> >    (((x << 8) + x) + 256) >> 16
> 
> I don't see how the int multiply in the DIV255 macro can be optimized
> away and be the same as your macro.

The int multiply is by a constant. X * 257 = X * 256 + X * 1. It's
possible to convert any constant multiplier into strings of shifts
and adds. GCC is smart enough to know when doing so is a win.

-- 
Sydney 2000 - The Best Olympic Games Ever