Thanks Fabian, good info there.
I'll check out minimax polynomials, and as both you and Simon pointed out base 2 log/exp would probably make more sense.
On 18.08.2010 18:16, Jeff Russell wrote:Unfortunately, x in [0,1] isn't a big restriction for pow; it covers
> So I need to speed up the CRT pow() function a bit, but I have some
> restrictions on the input which hopefully should give me some room to
> I need to compute pow(x,y), where x is on [0,1], and y is guaranteed
> positive and could have an upper bound in the neighborhood of 1000.
roughly a quarter of all possible 32-bit floating point numbers over a
wide range of exponents. pow on [0,1] is, for all practical purposes,
just as expensive as a full-range one.
What's a lot more interesting is the distribution of your x inside
[0,1], and what type of error bound you need. Some background first: A
full-precision pow tries to return a floating-point number that is as
close as possible to the result you would get if you were doing the
computation exactly then rounding to the nearest representable FP
number. The error of numerical approximations is usually specified in
ulps (units in the last place). With IEEE floating point, addition,
subtraction, multiplication and division are guaranteed to return
results accurate to within 0.5ulps (which is equivalent to saying that
the results are the same as if they had been performed exactly then
rounded). For a variety of reasons, you can't get the same error bound
on pow and a lot of other transcendental functions, but library
implementations are usually on the order of 1ulp or less, so they're
Anyway, if you don't care about relative error, but rather need an
absolute error bound, you can play fast and loose with small x (as long
as y > 1) and the like.
exp(log(x) * y) is already significantly worse than a "true" pow(x, y)
> I need "reasonable" accuracy (could be a little looser than the standard
> pow() implementation). I've searched online and found some bit twiddling
> approaches that claim to be very fast, but they seem to be too
> inaccurate for my purposes. I've tried implementing pow() as exp(
> log(x), y ), with my own cheap Taylor series in place of the natural log
> function. It did produce good output but wasn't very fast (slightly
in terms of relative error, and it has some extra roundoff steps that
aren't directly visible: exp and log are usually implemented in terms of
exp2 and log2 (since that matches better with the internal FP
representation), so there's some invisible conversion factors in there.
Taylor series approximations are quite useless in practice. You can get
a lot more precision for the same amount of work (or, equivalently, the
same precision with a lot less work) by using optimized approximation
polynomials (Minimax polynomials are usually the weapon of choice, but
you really need a Computer Algebra package if you want to build your own
approximation polynomials). log is still a fairly tough nut to crack
however, since it's hard to approximate properly with polynomials.
Robin Green did a great GDC 2003 presentation on the subject, it's
available here: (transcendental functions are in the 2nd part, but you
should still start at the beginning)
Big spoiler: There's no really great general solution for pow that's
significantly faster than what the CRT does.
A bit more context would help. What are you pow'ing, and what do you do
> slower than the CRT pow()). It is probably worth mentioning before
> anyone asks that yes I have confirmed pow() as the bottleneck with a
> profiling tool ;-)
with the results? General pow is hard, but for applications like
rendering, you can usually get by with rather rough approximations.
A handy reference is... the CRT source code. VC++ doesn't come with
> I would also love to just see a sample implementation of pow(), log(),
> and exp() somewhere, even that might be helpful.
source code to the math functions though - you can either just
disassemble (e.g. with objdump) or look at another CRT implementation.
glibc is a good candidate.
glibc math implementations are in sysdeps/ieee754 for generic IEEE-754
compliant platforms, with optimized versions for all relevant
architectures in sysdeps/<arch>. If you really want to know how it's
This SF.net email is sponsored by
Make an app they can't live without
Enter the BlackBerry Developer Challenge
GDAlgorithms-list mailing list