I'll check out minimax polynomials, and as both you and Simon pointed out base 2 log/exp would probably make more sense.

Jeff

On Thu, Aug 19, 2010 at 1:35 AM, Fabian Giesen <ryg@gmx.net> wrote:

On 18.08.2010 18:16, Jeff Russell wrote:Unfortunately, x in [0,1] isn't a big restriction for pow; it covers

> So I need to speed up the CRT pow() function a bit, but I have some

> restrictions on the input which hopefully should give me some room to

> optimize:

>

> I need to compute pow(x,y), where x is on [0,1], and y is guaranteed

> positive and could have an upper bound in the neighborhood of 1000.

roughly a quarter of all possible 32-bit floating point numbers over a

wide range of exponents. pow on [0,1] is, for all practical purposes,

just as expensive as a full-range one.

What's a lot more interesting is the distribution of your x inside

[0,1], and what type of error bound you need. Some background first: A

full-precision pow tries to return a floating-point number that is as

close as possible to the result you would get if you were doing the

computation exactly then rounding to the nearest representable FP

number. The error of numerical approximations is usually specified in

ulps (units in the last place). With IEEE floating point, addition,

subtraction, multiplication and division are guaranteed to return

results accurate to within 0.5ulps (which is equivalent to saying that

the results are the same as if they had been performed exactly then

rounded). For a variety of reasons, you can't get the same error bound

on pow and a lot of other transcendental functions, but library

implementations are usually on the order of 1ulp or less, so they're

pretty accurate.

Anyway, if you don't care about relative error, but rather need an

absolute error bound, you can play fast and loose with small x (as long

as y > 1) and the like.

exp(log(x) * y) is already significantly worse than a "true" pow(x, y)

> I need "reasonable" accuracy (could be a little looser than the standard

> pow() implementation). I've searched online and found some bit twiddling

> approaches that claim to be very fast, but they seem to be too

> inaccurate for my purposes. I've tried implementing pow() as exp(

> log(x), y ), with my own cheap Taylor series in place of the natural log

> function. It did produce good output but wasn't very fast (slightly

in terms of relative error, and it has some extra roundoff steps that

aren't directly visible: exp and log are usually implemented in terms of

exp2 and log2 (since that matches better with the internal FP

representation), so there's some invisible conversion factors in there.

Taylor series approximations are quite useless in practice. You can get

a lot more precision for the same amount of work (or, equivalently, the

same precision with a lot less work) by using optimized approximation

polynomials (Minimax polynomials are usually the weapon of choice, but

you really need a Computer Algebra package if you want to build your own

approximation polynomials). log is still a fairly tough nut to crack

however, since it's hard to approximate properly with polynomials.

Robin Green did a great GDC 2003 presentation on the subject, it's

available here: (transcendental functions are in the 2nd part, but you

should still start at the beginning)

http://www.research.scea.com/gdc2003/fast-math-functions.html

Big spoiler: There's no really great general solution for pow that's

significantly faster than what the CRT does.

A bit more context would help. What are you pow'ing, and what do you do

> slower than the CRT pow()). It is probably worth mentioning before

> anyone asks that yes I have confirmed pow() as the bottleneck with a

> profiling tool ;-)

with the results? General pow is hard, but for applications like

rendering, you can usually get by with rather rough approximations.

A handy reference is... the CRT source code. VC++ doesn't come with

> I would also love to just see a sample implementation of pow(), log(),

> and exp() somewhere, even that might be helpful.

source code to the math functions though - you can either just

disassemble (e.g. with objdump) or look at another CRT implementation.

glibc is a good candidate.

glibc math implementations are in sysdeps/ieee754 for generic IEEE-754

compliant platforms, with optimized versions for all relevant

architectures in sysdeps/<arch>. If you really want to know how it's

implemented :)

-Fabian

------------------------------------------------------------------------------

This SF.net email is sponsored by

Make an app they can't live without

Enter the BlackBerry Developer Challenge

http://p.sf.net/sfu/RIM-dev2dev

_______________________________________________

GDAlgorithms-list mailing list

GDAlgorithms-list@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list

Archives:

http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list

--

Jeff Russell

Engineer, 8monkey Labs

www.8monkeylabs.com