Re: [Algorithms] fast pow() for limited inputs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 19.08.2010 10:57, Robin Green wrote:
> On Wed, Aug 18, 2010 at 11:35 PM, Fabian Giesen<ry...@gm...>  wrote:
>>
>>> I would also love to just see a sample implementation of pow(), log(),
>>> and exp() somewhere, even that might be helpful.
>>
>> glibc math implementations are in sysdeps/ieee754 for generic IEEE-754
>> compliant platforms, with optimized versions for all relevant
>> architectures in sysdeps/<arch>. If you really want to know how it's
>> implemented :)
>
>
> What he said.
>
> Also, take a look at the CEPHES library for platform agnostic
> reference implementations of the C math functions and some extras like
> cotangent, cuberoot and integer powers:
>
>      http://www.netlib.org/cephes/
>
> And here's an X86 specific implementation of powf() that claims to be
> faster (than what, it doesn't say):
>
>     http://www.xyzw.de/c190.html

Now that's interesting :). I wrote most of that header file, around 2000 
or so. It's faster than what used to be the standard pow() 
implementation on x86 (as in the VC++ 6.0 runtime library), using fscale 
(that method is still used for sFExp below). This is all code for 64k 
intros so it was optimized for size originally, but pow was a bottleneck 
during texture generation, and Agner Fogs version was 20-30% faster if I 
recall correctly. (This was back when P3s were the norm though, no idea 
how it looks now). The main change is to replace the fscale (which used 
to be very slow on some processors) with a longer code sequence that's 
faster.

The original code sequence used to be commented out before the "// 
faster pow" comment, but I guess that got removed at some point :).

Since VS2002 or 2003, the C library contains a much better pow() 
implementation (using SSE on processors that support it) that should be 
faster than this code. It's also a lot bigger though.

-Fabian