Another option, even if you can’t find a much faster approximation, is to unroll your loop 4 times and calculate 4 pows at the same time using SSE/VMX.

 

From: Jeff Russell [mailto:jeffdr@8monkeylabs.com]
Sent: 19 August 2010 2:16am
To: Game Development Algorithms
Subject: [Algorithms] fast pow() for limited inputs

 

So I need to speed up the CRT pow() function a bit, but I have some restrictions on the input which hopefully should give me some room to optimize:

I need to compute pow(x,y), where x is on [0,1], and y is guaranteed positive and could have an upper bound in the neighborhood of 1000.

I need "reasonable" accuracy (could be a little looser than the standard pow() implementation). I've searched online and found some bit twiddling approaches that claim to be very fast, but they seem to be too inaccurate for my purposes. I've tried implementing pow() as exp( log(x), y ), with my own cheap Taylor series in place of the natural log function. It did produce good output but wasn't very fast (slightly slower than the CRT pow()). It is probably worth mentioning before anyone asks that yes I have confirmed pow() as the bottleneck with a profiling tool ;-)

I would also love to just see a sample implementation of pow(), log(), and exp() somewhere, even that might be helpful.

Thanks,

Jeff

--
Jeff Russell
Engineer, 8monkey Labs
www.8monkeylabs.com