Menu

Inside Zen FSQRT

When David Johnson introduced the Zen floating point library to 4tH, many functions were missing. Most importantly, the FSQRT function, which is used in several other floating point functions.

In essence, it's quite easy to make an FSQRT function. The algorithm has been known for - literally - ages. It is better known as the Babylonian method:

x ~ (x + (S/x))/2

Where S is the original number and x is its square root we're looking for. Unfortunately, the efficiency of the algorithm depends on the accuracy of the original estimate. If it's way off, so will be the result. Worse, computers are lousy when it comes to guessing.

Of course, you can start off with the number itself and let it iterate until it's there, since this is a quadratically convergent algorithm, which means that the number of correct digits of the approximation roughly doubles with each iteration. But floating point arithmetic is expensive, so performance would suffer.

Zen uses a shared stack and stores its numbers in base decimal as two signed integers:

mantissa exponent

This allows us to represent the square root as:

SQRT(mantissa) exponent/2

As long as the exponent is even, that is. We can resolve the square root of the mantissa by the same algorithm, but by using integer calculation, which is much, much faster. The only thing is, we want to make our estimate as accurate as possible by getting as many correct digits as possible. For that we need to boost our mantissa to the max, which equals:

MAX-INT/10 < mantissa < MAX-INT

Where MAX-INT equals the largest possible signed integer, which differs of course, depending on the width of a cell on a particular processor (e.g. 16, 32 or 64 bits). It works like this: let's say we're trying to get the square root of 123.45, which is stored on the stack as:

12345 -2

Which equals 12345 * 10^-2. We can boost the mantissa by multiplying it by 10 and decrementing the exponent, since:

12345 * 10^-2 = 123450 * 10^-3

We just do this until we pass the border MAX-INT/10 and we're there. Let's represent our mantissa in 8 bits binary (it's the same for larger sized words, just easier to represent). We maxed out our mantissa, so it should be close to:

1111 1111

True? Not quite, for two reasons:

  1. It's a signed number, so we have a sign bit;
  2. We're using decimal, so we won't max out all eight bits.

For arguments sake, let's say we're using a maximum of six bits. Then our root will be close to:

0111

Since:

0111 * 0111 = 00110001

Note we're trying to get a good estimate here, not the correct result. 49 is close enough to 63 to be interesting. What we got here is a universal estimate, which is not bad at all. And what if the exponent is uneven? Well, we shift it back the same way we shifted it up. No big deal. In short, our first estimate now equals (in binary):

111 exponent/2

Now let's get rolling! On a 32 bit machine we crank out 4 to 5 correct digits for the mantissa in (on average) 3.5 iterations. E.g. the square root of 2 is returned as:

14142 * 10^-4

After just four integer iterations. To get 1.41421356 we just need two or three floating point iterations, which is not bad at all.

But of course, the proof is in the pudding. How fast is it? If we compare it against the ANS floating point FSQRT, it is over twice as fast. Then again, ANS floating point uses double word mantissas.

On the other hand, if it's speed had been comparable, it would have been slower, so I don't think we're doing so bad ;-)