From: Nathan Hand <nathanh@ma...>  20001019 16:49:37

On Thu, Oct 19, 2000 at 09:14:34AM 0600, Brian Paul wrote: > Nathan Hand wrote: > > > > On Thu, Oct 19, 2000 at 07:04:12AM 0600, Brian Paul wrote: > > > > > > #define DIV255(X) (((X) * ((1 << 16) / 255) + 256) >> 16) > > > > When this code is passed through gcc O2 the assembly produced is > > > > movl %edx,%eax # %eax = x > > sall $8,%eax # %eax = x << 8 > > leal 256(%edx,%eax),%eax # %eax = (x << 8) + x + 256 > > sarl $16,%eax # %eax = ((x << 8) + x + 256) >> 16 > > Huh? I think you're looking at the x86 code for your method, > not my DIV255 macro. Both my method and your DIV255 method produce the same assembly if compiled with O2. You can verify this yourself with objdump. > > So gcc converts DIV255 into the method I posted earlier. > > > > > The macro's cost is an integer multiply, add and shift. > > > The total cost for one color channel is 3 mults, 3 adds/subs and 1 shift. > > > > After O2 the cost is 2 shifts and 2 adds. > > > > > Blinn's method uses 2 mults, 5 adds/subs, 5 shifts and a temp var. > > > > > > I haven't done any benchmarking. > > > > I've attached a simple benchmark. Compile with O2. Results are > > > > DIV255 5639408 > > approx 5629603 > > exact 5629793 > > > > Smaller is better. Numbers represent microseconds elapsed. > > > > DIV255 without optimisation is 1 multiply, 1 add, 1 shift. > > > > (((x) * ((1 << 16) / 255) + 256) >> 16) > > > > The approximate version isn't a significant win. > > > > ((x << 8) + x) >> 16 > > > > The exact version is the same as DIV255 after O2, but is explicit. > > > > (((x << 8) + x) + 256) >> 16 > > I don't see how the int multiply in the DIV255 macro can be optimized > away and be the same as your macro. The int multiply is by a constant. X * 257 = X * 256 + X * 1. It's possible to convert any constant multiplier into strings of shifts and adds. GCC is smart enough to know when doing so is a win.  Sydney 2000  The Best Olympic Games Ever 