From: Charles R H. <cha...@gm...> - 2006-09-22 18:14:29
|
On 9/22/06, Tim Hochberg <tim...@ie...> wrote: > > Sebastian Haase wrote: > > On Thursday 21 September 2006 15:28, Tim Hochberg wrote: > > > >> David M. Cooke wrote: > >> > >>> On Thu, 21 Sep 2006 11:34:42 -0700 > >>> > >>> Tim Hochberg <tim...@ie...> wrote: > >>> > >>>> Tim Hochberg wrote: > >>>> > >>>>> Robert Kern wrote: > >>>>> > >>>>>> David M. Cooke wrote: > >>>>>> > >>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > >>>>>>> > >>>>>>>> Let me offer a third path: the algorithms used for .mean() and > >>>>>>>> .var() are substandard. There are much better incremental > algorithms > >>>>>>>> that entirely avoid the need to accumulate such large (and > therefore > >>>>>>>> precision-losing) intermediate values. The algorithms look like > the > >>>>>>>> following for 1D arrays in Python: > >>>>>>>> > >>>>>>>> def mean(a): > >>>>>>>> m = a[0] > >>>>>>>> for i in range(1, len(a)): > >>>>>>>> m += (a[i] - m) / (i + 1) > >>>>>>>> return m > >>>>>>>> > >>>>>>> This isn't really going to be any better than using a simple sum. > >>>>>>> It'll also be slower (a division per iteration). > >>>>>>> > >>>>>> With one exception, every test that I've thrown at it shows that > it's > >>>>>> better for float32. That exception is uniformly spaced arrays, like > >>>>>> linspace(). > >>>>>> > >>>>>> > You do avoid > >>>>>> > accumulating large sums, but then doing the division a[i]/len(a) > >>>>>> > and adding that will do the same. > >>>>>> > >>>>>> Okay, this is true. > >>>>>> > >>>>>> > >>>>>>> Now, if you want to avoid losing precision, you want to use a > better > >>>>>>> summation technique, like compensated (or Kahan) summation: > >>>>>>> > >>>>>>> def mean(a): > >>>>>>> s = e = a.dtype.type(0) > >>>>>>> for i in range(0, len(a)): > >>>>>>> temp = s > >>>>>>> y = a[i] + e > >>>>>>> s = temp + y > >>>>>>> e = (temp - s) + y > >>>>>>> return s / len(a) > >>>>>>> > >>>>>>> > >>>>>>>> def var(a): > >>>>>>>> m = a[0] > >>>>>>>> t = a.dtype.type(0) > >>>>>>>> for i in range(1, len(a)): > >>>>>>>> q = a[i] - m > >>>>>>>> r = q / (i+1) > >>>>>>>> m += r > >>>>>>>> t += i * q * r > >>>>>>>> t /= len(a) > >>>>>>>> return t > >>>>>>>> > >>>>>>>> Alternatively, from Knuth: > >>>>>>>> > >>>>>>>> def var_knuth(a): > >>>>>>>> m = a.dtype.type(0) > >>>>>>>> variance = a.dtype.type(0) > >>>>>>>> for i in range(len(a)): > >>>>>>>> delta = a[i] - m > >>>>>>>> m += delta / (i+1) > >>>>>>>> variance += delta * (a[i] - m) > >>>>>>>> variance /= len(a) > >>>>>>>> return variance > >>>>>>>> > >>>> I'm going to go ahead and attach a module containing the versions of > >>>> mean, var, etc that I've been playing with in case someone wants to > mess > >>>> with them. Some were stolen from traffic on this list, for others I > >>>> grabbed the algorithms from wikipedia or equivalent. > >>>> > >>> I looked into this a bit more. I checked float32 (single precision) > and > >>> float64 (double precision), using long doubles (float96) for the > "exact" > >>> results. This is based on your code. Results are compared using > >>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values > in > >>> the range of [-100, 900] > >>> > >>> First, the mean. In float32, the Kahan summation in single precision > is > >>> better by about 2 orders of magnitude than simple summation. However, > >>> accumulating the sum in double precision is better by about 9 orders > of > >>> magnitude than simple summation (7 orders more than Kahan). > >>> > >>> In float64, Kahan summation is the way to go, by 2 orders of > magnitude. > >>> > >>> For the variance, in float32, Knuth's method is *no better* than the > >>> two-pass method. Tim's code does an implicit conversion of > intermediate > >>> results to float64, which is why he saw a much better result. > >>> > >> Doh! And I fixed that same problem in the mean implementation earlier > >> too. I was astounded by how good knuth was doing, but not astounded > >> enough apparently. > >> > >> Does it seem weird to anyone else that in: > >> numpy_scalar <op> python_scalar > >> the precision ends up being controlled by the python scalar? I would > >> expect the numpy_scalar to control the resulting precision just like > >> numpy arrays do in similar circumstances. Perhaps the egg on my face is > >> just clouding my vision though. > >> > >> > >>> The two-pass method using > >>> Kahan summation (again, in single precision), is better by about 2 > orders > >>> of magnitude. There is practically no difference when using a > >>> double-precision accumulator amongst the techniques: they're all about > 9 > >>> orders of magnitude better than single-precision two-pass. > >>> > >>> In float64, Kahan summation is again better than the rest, by about 2 > >>> orders of magnitude. > >>> > >>> I've put my adaptation of Tim's code, and box-and-whisker plots of the > >>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ > >>> > >>> Conclusions: > >>> > >>> - If you're going to calculate everything in single precision, use > Kahan > >>> summation. Using it in double-precision also helps. > >>> - If you can use a double-precision accumulator, it's much better than > >>> any of the techniques in single-precision only. > >>> > >>> - for speed+precision in the variance, either use Kahan summation in > >>> single precision with the two-pass method, or use double precision > with > >>> simple summation with the two-pass method. Knuth buys you nothing, > except > >>> slower code :-) > >>> > >> The two pass methods are definitely more accurate. I won't be convinced > >> on the speed front till I see comparable C implementations slug it out. > >> That may well mean never in practice. However, I expect that somewhere > >> around 10,000 items, the cache will overflow and memory bandwidth will > >> become the bottleneck. At that point the extra operations of Knuth > won't > >> matter as much as making two passes through the array and Knuth will > win > >> on speed. Of course the accuracy is pretty bad at single precision, so > >> the possible, theoretical speed advantage at large sizes probably > >> doesn't matter. > >> > >> -tim > >> > >> > >>> After 1.0 is out, we should look at doing one of the above. > >>> > >> +1 > >> > >> > > Hi, > > I don't understand much of these "fancy algorithms", but does the above > mean > > that > > after 1.0 is released the float32 functions will catch up in terms of > > precision precision ("to some extend") with using dtype=float64 ? > > > It sounds like there is some consensus to do something to improve the > precision after 1.0 comes out. I don't think the details are entirely > nailed down, but it sounds like some combination of using Kahan > summation and increasing the minimum precision of the accumulator is > likely for sum, mean, var and std. I would go with double as the default except when the base precision is greater. Higher precisions should be available as a check, i.e., if the results look funny use Kahan or extended to see if roundoff is a problem. I think Kahan should be available because it adds precision to *any* base precision, even extended or quad, so there is always something to check against. But I don't think Kahan should be the default because of the speed hit. Chuck |