Re: [Algorithms] fast pow() for limited inputs
Brought to you by:
vexxed72
From: Jeff R. <je...@8m...> - 2010-08-19 15:32:16
|
Thanks Fabian, good info there. I'll check out minimax polynomials, and as both you and Simon pointed out base 2 log/exp would probably make more sense. Jeff On Thu, Aug 19, 2010 at 1:35 AM, Fabian Giesen <ry...@gm...> wrote: > On 18.08.2010 18:16, Jeff Russell wrote: > > So I need to speed up the CRT pow() function a bit, but I have some > > restrictions on the input which hopefully should give me some room to > > optimize: > > > > I need to compute pow(x,y), where x is on [0,1], and y is guaranteed > > positive and could have an upper bound in the neighborhood of 1000. > > Unfortunately, x in [0,1] isn't a big restriction for pow; it covers > roughly a quarter of all possible 32-bit floating point numbers over a > wide range of exponents. pow on [0,1] is, for all practical purposes, > just as expensive as a full-range one. > > What's a lot more interesting is the distribution of your x inside > [0,1], and what type of error bound you need. Some background first: A > full-precision pow tries to return a floating-point number that is as > close as possible to the result you would get if you were doing the > computation exactly then rounding to the nearest representable FP > number. The error of numerical approximations is usually specified in > ulps (units in the last place). With IEEE floating point, addition, > subtraction, multiplication and division are guaranteed to return > results accurate to within 0.5ulps (which is equivalent to saying that > the results are the same as if they had been performed exactly then > rounded). For a variety of reasons, you can't get the same error bound > on pow and a lot of other transcendental functions, but library > implementations are usually on the order of 1ulp or less, so they're > pretty accurate. > > Anyway, if you don't care about relative error, but rather need an > absolute error bound, you can play fast and loose with small x (as long > as y > 1) and the like. > > > I need "reasonable" accuracy (could be a little looser than the standard > > pow() implementation). I've searched online and found some bit twiddling > > approaches that claim to be very fast, but they seem to be too > > inaccurate for my purposes. I've tried implementing pow() as exp( > > log(x), y ), with my own cheap Taylor series in place of the natural log > > function. It did produce good output but wasn't very fast (slightly > > exp(log(x) * y) is already significantly worse than a "true" pow(x, y) > in terms of relative error, and it has some extra roundoff steps that > aren't directly visible: exp and log are usually implemented in terms of > exp2 and log2 (since that matches better with the internal FP > representation), so there's some invisible conversion factors in there. > > Taylor series approximations are quite useless in practice. You can get > a lot more precision for the same amount of work (or, equivalently, the > same precision with a lot less work) by using optimized approximation > polynomials (Minimax polynomials are usually the weapon of choice, but > you really need a Computer Algebra package if you want to build your own > approximation polynomials). log is still a fairly tough nut to crack > however, since it's hard to approximate properly with polynomials. > > Robin Green did a great GDC 2003 presentation on the subject, it's > available here: (transcendental functions are in the 2nd part, but you > should still start at the beginning) > > http://www.research.scea.com/gdc2003/fast-math-functions.html > > Big spoiler: There's no really great general solution for pow that's > significantly faster than what the CRT does. > > > slower than the CRT pow()). It is probably worth mentioning before > > anyone asks that yes I have confirmed pow() as the bottleneck with a > > profiling tool ;-) > > A bit more context would help. What are you pow'ing, and what do you do > with the results? General pow is hard, but for applications like > rendering, you can usually get by with rather rough approximations. > > > I would also love to just see a sample implementation of pow(), log(), > > and exp() somewhere, even that might be helpful. > > A handy reference is... the CRT source code. VC++ doesn't come with > source code to the math functions though - you can either just > disassemble (e.g. with objdump) or look at another CRT implementation. > glibc is a good candidate. > > glibc math implementations are in sysdeps/ieee754 for generic IEEE-754 > compliant platforms, with optimized versions for all relevant > architectures in sysdeps/<arch>. If you really want to know how it's > implemented :) > > -Fabian > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- Jeff Russell Engineer, 8monkey Labs www.8monkeylabs.com |