Re: [Algorithms] fast pow() for limited inputs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Thanks Fabian, good info there.

I'll check out minimax polynomials, and as both you and Simon pointed out
base 2 log/exp would probably make more sense.

Jeff

On Thu, Aug 19, 2010 at 1:35 AM, Fabian Giesen <ry...@gm...> wrote:

> On 18.08.2010 18:16, Jeff Russell wrote:
> > So I need to speed up the CRT pow() function a bit, but I have some
> > restrictions on the input which hopefully should give me some room to
> > optimize:
> >
> > I need to compute pow(x,y), where x is on [0,1], and y is guaranteed
> > positive and could have an upper bound in the neighborhood of 1000.
>
> Unfortunately, x in [0,1] isn't a big restriction for pow; it covers
> roughly a quarter of all possible 32-bit floating point numbers over a
> wide range of exponents. pow on [0,1] is, for all practical purposes,
> just as expensive as a full-range one.
>
> What's a lot more interesting is the distribution of your x inside
> [0,1], and what type of error bound you need. Some background first: A
> full-precision pow tries to return a floating-point number that is as
> close as possible to the result you would get if you were doing the
> computation exactly then rounding to the nearest representable FP
> number. The error of numerical approximations is usually specified in
> ulps (units in the last place). With IEEE floating point, addition,
> subtraction, multiplication and division are guaranteed to return
> results accurate to within 0.5ulps (which is equivalent to saying that
> the results are the same as if they had been performed exactly then
> rounded). For a variety of reasons, you can't get the same error bound
> on pow and a lot of other transcendental functions, but library
> implementations are usually on the order of 1ulp or less, so they're
> pretty accurate.
>
> Anyway, if you don't care about relative error, but rather need an
> absolute error bound, you can play fast and loose with small x (as long
> as y > 1) and the like.
>
> > I need "reasonable" accuracy (could be a little looser than the standard
> > pow() implementation). I've searched online and found some bit twiddling
> > approaches that claim to be very fast, but they seem to be too
> > inaccurate for my purposes. I've tried implementing pow() as exp(
> > log(x), y ), with my own cheap Taylor series in place of the natural log
> > function. It did produce good output but wasn't very fast (slightly
>
> exp(log(x) * y) is already significantly worse than a "true" pow(x, y)
> in terms of relative error, and it has some extra roundoff steps that
> aren't directly visible: exp and log are usually implemented in terms of
> exp2 and log2 (since that matches better with the internal FP
> representation), so there's some invisible conversion factors in there.
>
> Taylor series approximations are quite useless in practice. You can get
> a lot more precision for the same amount of work (or, equivalently, the
> same precision with a lot less work) by using optimized approximation
> polynomials (Minimax polynomials are usually the weapon of choice, but
> you really need a Computer Algebra package if you want to build your own
> approximation polynomials). log is still a fairly tough nut to crack
> however, since it's hard to approximate properly with polynomials.
>
> Robin Green did a great GDC 2003 presentation on the subject, it's
> available here: (transcendental functions are in the 2nd part, but you
> should still start at the beginning)
>
>   http://www.research.scea.com/gdc2003/fast-math-functions.html
>
> Big spoiler: There's no really great general solution for pow that's
> significantly faster than what the CRT does.
>
> > slower than the CRT pow()). It is probably worth mentioning before
> > anyone asks that yes I have confirmed pow() as the bottleneck with a
> > profiling tool ;-)
>
> A bit more context would help. What are you pow'ing, and what do you do
> with the results? General pow is hard, but for applications like
> rendering, you can usually get by with rather rough approximations.
>
> > I would also love to just see a sample implementation of pow(), log(),
> > and exp() somewhere, even that might be helpful.
>
> A handy reference is... the CRT source code. VC++ doesn't come with
> source code to the math functions though - you can either just
> disassemble (e.g. with objdump) or look at another CRT implementation.
> glibc is a good candidate.
>
> glibc math implementations are in sysdeps/ieee754 for generic IEEE-754
> compliant platforms, with optimized versions for all relevant
> architectures in sysdeps/<arch>. If you really want to know how it's
> implemented :)
>
> -Fabian
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>

-- 
Jeff Russell
Engineer, 8monkey Labs
www.8monkeylabs.com