Thread: Re: [Numpy-discussion] please change mean to use dtype=float (Page 2)

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

Re: [Numpy-discussion] please change mean to use dtype=float

From: Robert K. <rob...@gm...> - 2006-09-20 00:19:13

Sebastian Haase wrote:
> (don't know how to say this for complex types !? Are here real and imag 
> treated separately / independently ?)

Yes. For mean(), there's really no alternative. Scalar variance is not a 
well-defined concept for complex numbers, but treating the real and imaginary 
parts separately is a sensible and (partially) informative thing to do. Simply 
applying the formula for estimating variance for real numbers to complex numbers 
(i.e. change "x" to "z") is a meaningless operation.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Re: [Numpy-discussion] please change mean to use dtype=float

From: David M. C. <co...@ph...> - 2006-09-21 21:59:43

On Thu, 21 Sep 2006 11:34:42 -0700
Tim Hochberg <tim...@ie...> wrote:

> Tim Hochberg wrote:
> > Robert Kern wrote:
> >   
> >> David M. Cooke wrote:
> >>   
> >>     
> >>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> >>>     
> >>>       
> >>>> Let me offer a third path: the algorithms used for .mean() and .var()
> >>>> are substandard. There are much better incremental algorithms that
> >>>> entirely avoid the need to accumulate such large (and therefore
> >>>> precision-losing) intermediate values. The algorithms look like the
> >>>> following for 1D arrays in Python:
> >>>>
> >>>> def mean(a):
> >>>>      m = a[0]
> >>>>      for i in range(1, len(a)):
> >>>>          m += (a[i] - m) / (i + 1)
> >>>>      return m
> >>>>       
> >>>>         
> >>> This isn't really going to be any better than using a simple sum.
> >>> It'll also be slower (a division per iteration).
> >>>     
> >>>       
> >> With one exception, every test that I've thrown at it shows that it's
> >> better for float32. That exception is uniformly spaced arrays, like
> >> linspace().
> >>
> >>  > You do avoid
> >>  > accumulating large sums, but then doing the division a[i]/len(a) and
> >>  > adding that will do the same.
> >>
> >> Okay, this is true.
> >>
> >>   
> >>     
> >>> Now, if you want to avoid losing precision, you want to use a better
> >>> summation technique, like compensated (or Kahan) summation:
> >>>
> >>> def mean(a):
> >>>     s = e = a.dtype.type(0)
> >>>     for i in range(0, len(a)):
> >>>         temp = s
> >>>         y = a[i] + e
> >>>         s = temp + y
> >>>         e = (temp - s) + y
> >>>     return s / len(a)
> >>     
> >>>> def var(a):
> >>>>      m = a[0]
> >>>>      t = a.dtype.type(0)
> >>>>      for i in range(1, len(a)):
> >>>>          q = a[i] - m
> >>>>          r = q / (i+1)
> >>>>          m += r
> >>>>          t += i * q * r
> >>>>      t /= len(a)
> >>>>      return t
> >>>>
> >>>> Alternatively, from Knuth:
> >>>>
> >>>> def var_knuth(a):
> >>>>      m = a.dtype.type(0)
> >>>>      variance = a.dtype.type(0)
> >>>>      for i in range(len(a)):
> >>>>          delta = a[i] - m
> >>>>          m += delta / (i+1)
> >>>>          variance += delta * (a[i] - m)
> >>>>      variance /= len(a)
> >>>>      return variance
> 
> I'm going to go ahead and attach a module containing the versions of 
> mean, var, etc that I've been playing with in case someone wants to mess 
> with them. Some were stolen from traffic on this list, for others I 
> grabbed the algorithms from wikipedia or equivalent.

I looked into this a bit more. I checked float32 (single precision) and
float64 (double precision), using long doubles (float96) for the "exact"
results. This is based on your code. Results are compared using
abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the
range of [-100, 900]

First, the mean. In float32, the Kahan summation in single precision is
better by about 2 orders of magnitude than simple summation. However,
accumulating the sum in double precision is better by about 9 orders of
magnitude than simple summation (7 orders more than Kahan).

In float64, Kahan summation is the way to go, by 2 orders of magnitude.

For the variance, in float32, Knuth's method is *no better* than the two-pass
method. Tim's code does an implicit conversion of intermediate results to
float64, which is why he saw a much better result. The two-pass method using
Kahan summation (again, in single precision), is better by about 2 orders of
magnitude. There is practically no difference when using a double-precision
accumulator amongst the techniques: they're all about 9 orders of magnitude
better than single-precision two-pass.

In float64, Kahan summation is again better than the rest, by about 2 orders
of magnitude.

I've put my adaptation of Tim's code, and box-and-whisker plots of the
results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/

Conclusions:

- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.

- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple
summation with the two-pass method. Knuth buys you nothing, except slower
code :-)

After 1.0 is out, we should look at doing one of the above.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|co...@ph...

Re: [Numpy-discussion] please change mean to use dtype=float

From: Travis O. <oli...@ee...> - 2006-09-21 22:07:56

David M. Cooke wrote:

>
>Conclusions:
>
>- If you're going to calculate everything in single precision, use Kahan
>summation. Using it in double-precision also helps.
>- If you can use a double-precision accumulator, it's much better than any of
>the techniques in single-precision only.
>
>- for speed+precision in the variance, either use Kahan summation in single
>precision with the two-pass method, or use double precision with simple
>summation with the two-pass method. Knuth buys you nothing, except slower
>code :-)
>
>After 1.0 is out, we should look at doing one of the above.
>  
>

+1

Re: [Numpy-discussion] please change mean to use dtype=float

From: Tim H. <tim...@ie...> - 2006-09-21 22:28:43

David M. Cooke wrote:
> On Thu, 21 Sep 2006 11:34:42 -0700
> Tim Hochberg <tim...@ie...> wrote:
>
>   
>> Tim Hochberg wrote:
>>     
>>> Robert Kern wrote:
>>>   
>>>       
>>>> David M. Cooke wrote:
>>>>   
>>>>     
>>>>         
>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
>>>>>     
>>>>>       
>>>>>           
>>>>>> Let me offer a third path: the algorithms used for .mean() and .var()
>>>>>> are substandard. There are much better incremental algorithms that
>>>>>> entirely avoid the need to accumulate such large (and therefore
>>>>>> precision-losing) intermediate values. The algorithms look like the
>>>>>> following for 1D arrays in Python:
>>>>>>
>>>>>> def mean(a):
>>>>>>      m = a[0]
>>>>>>      for i in range(1, len(a)):
>>>>>>          m += (a[i] - m) / (i + 1)
>>>>>>      return m
>>>>>>       
>>>>>>         
>>>>>>             
>>>>> This isn't really going to be any better than using a simple sum.
>>>>> It'll also be slower (a division per iteration).
>>>>>     
>>>>>       
>>>>>           
>>>> With one exception, every test that I've thrown at it shows that it's
>>>> better for float32. That exception is uniformly spaced arrays, like
>>>> linspace().
>>>>
>>>>  > You do avoid
>>>>  > accumulating large sums, but then doing the division a[i]/len(a) and
>>>>  > adding that will do the same.
>>>>
>>>> Okay, this is true.
>>>>
>>>>   
>>>>     
>>>>         
>>>>> Now, if you want to avoid losing precision, you want to use a better
>>>>> summation technique, like compensated (or Kahan) summation:
>>>>>
>>>>> def mean(a):
>>>>>     s = e = a.dtype.type(0)
>>>>>     for i in range(0, len(a)):
>>>>>         temp = s
>>>>>         y = a[i] + e
>>>>>         s = temp + y
>>>>>         e = (temp - s) + y
>>>>>     return s / len(a)
>>>>>           
>>>>     
>>>>         
>>>>>> def var(a):
>>>>>>      m = a[0]
>>>>>>      t = a.dtype.type(0)
>>>>>>      for i in range(1, len(a)):
>>>>>>          q = a[i] - m
>>>>>>          r = q / (i+1)
>>>>>>          m += r
>>>>>>          t += i * q * r
>>>>>>      t /= len(a)
>>>>>>      return t
>>>>>>
>>>>>> Alternatively, from Knuth:
>>>>>>
>>>>>> def var_knuth(a):
>>>>>>      m = a.dtype.type(0)
>>>>>>      variance = a.dtype.type(0)
>>>>>>      for i in range(len(a)):
>>>>>>          delta = a[i] - m
>>>>>>          m += delta / (i+1)
>>>>>>          variance += delta * (a[i] - m)
>>>>>>      variance /= len(a)
>>>>>>      return variance
>>>>>>             
>> I'm going to go ahead and attach a module containing the versions of 
>> mean, var, etc that I've been playing with in case someone wants to mess 
>> with them. Some were stolen from traffic on this list, for others I 
>> grabbed the algorithms from wikipedia or equivalent.
>>     
>
> I looked into this a bit more. I checked float32 (single precision) and
> float64 (double precision), using long doubles (float96) for the "exact"
> results. This is based on your code. Results are compared using
> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the
> range of [-100, 900]
>
> First, the mean. In float32, the Kahan summation in single precision is
> better by about 2 orders of magnitude than simple summation. However,
> accumulating the sum in double precision is better by about 9 orders of
> magnitude than simple summation (7 orders more than Kahan).
>
> In float64, Kahan summation is the way to go, by 2 orders of magnitude.
>
> For the variance, in float32, Knuth's method is *no better* than the two-pass
> method. Tim's code does an implicit conversion of intermediate results to
> float64, which is why he saw a much better result. 
Doh! And I fixed that same problem in the mean implementation earlier 
too. I was astounded by how good knuth was doing, but not astounded 
enough apparently.

Does it seem weird to anyone else that in:
    numpy_scalar <op> python_scalar
the precision ends up being controlled by the python scalar? I would 
expect the numpy_scalar to control the resulting precision just like 
numpy arrays do in similar circumstances. Perhaps the egg on my face is 
just clouding my vision though.

> The two-pass method using
> Kahan summation (again, in single precision), is better by about 2 orders of
> magnitude. There is practically no difference when using a double-precision
> accumulator amongst the techniques: they're all about 9 orders of magnitude
> better than single-precision two-pass.
>
> In float64, Kahan summation is again better than the rest, by about 2 orders
> of magnitude.
>
> I've put my adaptation of Tim's code, and box-and-whisker plots of the
> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
>
> Conclusions:
>
> - If you're going to calculate everything in single precision, use Kahan
> summation. Using it in double-precision also helps.
> - If you can use a double-precision accumulator, it's much better than any of
> the techniques in single-precision only.
>
> - for speed+precision in the variance, either use Kahan summation in single
> precision with the two-pass method, or use double precision with simple
> summation with the two-pass method. Knuth buys you nothing, except slower
> code :-)
>   
The two pass methods are definitely more accurate. I won't be convinced 
on the speed front till I see comparable C implementations slug it out. 
That may well mean never in practice. However, I expect that somewhere 
around 10,000 items, the cache will overflow and memory bandwidth will 
become the bottleneck. At that point the extra operations of Knuth won't 
matter as much as making two passes through the array and Knuth will win 
on speed. Of course the accuracy is pretty bad at single precision, so 
the possible, theoretical speed advantage at large sizes probably 
doesn't matter.

-tim

> After 1.0 is out, we should look at doing one of the above.
>   

+1

Re: [Numpy-discussion] please change mean to use dtype=float

From: Sebastian H. <ha...@ms...> - 2006-09-21 23:22:11

On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
> David M. Cooke wrote:
> > On Thu, 21 Sep 2006 11:34:42 -0700
> >
> > Tim Hochberg <tim...@ie...> wrote:
> >> Tim Hochberg wrote:
> >>> Robert Kern wrote:
> >>>> David M. Cooke wrote:
> >>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> >>>>>> Let me offer a third path: the algorithms used for .mean() and
> >>>>>> .var() are substandard. There are much better incremental algorithms
> >>>>>> that entirely avoid the need to accumulate such large (and therefore
> >>>>>> precision-losing) intermediate values. The algorithms look like the
> >>>>>> following for 1D arrays in Python:
> >>>>>>
> >>>>>> def mean(a):
> >>>>>>      m = a[0]
> >>>>>>      for i in range(1, len(a)):
> >>>>>>          m += (a[i] - m) / (i + 1)
> >>>>>>      return m
> >>>>>
> >>>>> This isn't really going to be any better than using a simple sum.
> >>>>> It'll also be slower (a division per iteration).
> >>>>
> >>>> With one exception, every test that I've thrown at it shows that it's
> >>>> better for float32. That exception is uniformly spaced arrays, like
> >>>> linspace().
> >>>>
> >>>>  > You do avoid
> >>>>  > accumulating large sums, but then doing the division a[i]/len(a)
> >>>>  > and adding that will do the same.
> >>>>
> >>>> Okay, this is true.
> >>>>
> >>>>> Now, if you want to avoid losing precision, you want to use a better
> >>>>> summation technique, like compensated (or Kahan) summation:
> >>>>>
> >>>>> def mean(a):
> >>>>>     s = e = a.dtype.type(0)
> >>>>>     for i in range(0, len(a)):
> >>>>>         temp = s
> >>>>>         y = a[i] + e
> >>>>>         s = temp + y
> >>>>>         e = (temp - s) + y
> >>>>>     return s / len(a)
> >>>>>
> >>>>>> def var(a):
> >>>>>>      m = a[0]
> >>>>>>      t = a.dtype.type(0)
> >>>>>>      for i in range(1, len(a)):
> >>>>>>          q = a[i] - m
> >>>>>>          r = q / (i+1)
> >>>>>>          m += r
> >>>>>>          t += i * q * r
> >>>>>>      t /= len(a)
> >>>>>>      return t
> >>>>>>
> >>>>>> Alternatively, from Knuth:
> >>>>>>
> >>>>>> def var_knuth(a):
> >>>>>>      m = a.dtype.type(0)
> >>>>>>      variance = a.dtype.type(0)
> >>>>>>      for i in range(len(a)):
> >>>>>>          delta = a[i] - m
> >>>>>>          m += delta / (i+1)
> >>>>>>          variance += delta * (a[i] - m)
> >>>>>>      variance /= len(a)
> >>>>>>      return variance
> >>
> >> I'm going to go ahead and attach a module containing the versions of
> >> mean, var, etc that I've been playing with in case someone wants to mess
> >> with them. Some were stolen from traffic on this list, for others I
> >> grabbed the algorithms from wikipedia or equivalent.
> >
> > I looked into this a bit more. I checked float32 (single precision) and
> > float64 (double precision), using long doubles (float96) for the "exact"
> > results. This is based on your code. Results are compared using
> > abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in
> > the range of [-100, 900]
> >
> > First, the mean. In float32, the Kahan summation in single precision is
> > better by about 2 orders of magnitude than simple summation. However,
> > accumulating the sum in double precision is better by about 9 orders of
> > magnitude than simple summation (7 orders more than Kahan).
> >
> > In float64, Kahan summation is the way to go, by 2 orders of magnitude.
> >
> > For the variance, in float32, Knuth's method is *no better* than the
> > two-pass method. Tim's code does an implicit conversion of intermediate
> > results to float64, which is why he saw a much better result.
>
> Doh! And I fixed that same problem in the mean implementation earlier
> too. I was astounded by how good knuth was doing, but not astounded
> enough apparently.
>
> Does it seem weird to anyone else that in:
>     numpy_scalar <op> python_scalar
> the precision ends up being controlled by the python scalar? I would
> expect the numpy_scalar to control the resulting precision just like
> numpy arrays do in similar circumstances. Perhaps the egg on my face is
> just clouding my vision though.
>
> > The two-pass method using
> > Kahan summation (again, in single precision), is better by about 2 orders
> > of magnitude. There is practically no difference when using a
> > double-precision accumulator amongst the techniques: they're all about 9
> > orders of magnitude better than single-precision two-pass.
> >
> > In float64, Kahan summation is again better than the rest, by about 2
> > orders of magnitude.
> >
> > I've put my adaptation of Tim's code, and box-and-whisker plots of the
> > results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
> >
> > Conclusions:
> >
> > - If you're going to calculate everything in single precision, use Kahan
> > summation. Using it in double-precision also helps.
> > - If you can use a double-precision accumulator, it's much better than
> > any of the techniques in single-precision only.
> >
> > - for speed+precision in the variance, either use Kahan summation in
> > single precision with the two-pass method, or use double precision with
> > simple summation with the two-pass method. Knuth buys you nothing, except
> > slower code :-)
>
> The two pass methods are definitely more accurate. I won't be convinced
> on the speed front till I see comparable C implementations slug it out.
> That may well mean never in practice. However, I expect that somewhere
> around 10,000 items, the cache will overflow and memory bandwidth will
> become the bottleneck. At that point the extra operations of Knuth won't
> matter as much as making two passes through the array and Knuth will win
> on speed. Of course the accuracy is pretty bad at single precision, so
> the possible, theoretical speed advantage at large sizes probably
> doesn't matter.
>
> -tim
>
> > After 1.0 is out, we should look at doing one of the above.
>
> +1
>
Hi,
I don't understand much of these "fancy algorithms", but does the above mean 
that 
after 1.0 is released the float32 functions will catch up in terms of 
precision precision ("to some extend") with using dtype=float64 ?

- Sebastian Haase

Re: [Numpy-discussion] please change mean to use dtype=float

From: Charles R H. <cha...@gm...> - 2006-09-22 18:14:29

On 9/22/06, Tim Hochberg <tim...@ie...> wrote:
>
> Sebastian Haase wrote:
> > On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
> >
> >> David M. Cooke wrote:
> >>
> >>> On Thu, 21 Sep 2006 11:34:42 -0700
> >>>
> >>> Tim Hochberg <tim...@ie...> wrote:
> >>>
> >>>> Tim Hochberg wrote:
> >>>>
> >>>>> Robert Kern wrote:
> >>>>>
> >>>>>> David M. Cooke wrote:
> >>>>>>
> >>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> >>>>>>>
> >>>>>>>> Let me offer a third path: the algorithms used for .mean() and
> >>>>>>>> .var() are substandard. There are much better incremental
> algorithms
> >>>>>>>> that entirely avoid the need to accumulate such large (and
> therefore
> >>>>>>>> precision-losing) intermediate values. The algorithms look like
> the
> >>>>>>>> following for 1D arrays in Python:
> >>>>>>>>
> >>>>>>>> def mean(a):
> >>>>>>>>      m = a[0]
> >>>>>>>>      for i in range(1, len(a)):
> >>>>>>>>          m += (a[i] - m) / (i + 1)
> >>>>>>>>      return m
> >>>>>>>>
> >>>>>>> This isn't really going to be any better than using a simple sum.
> >>>>>>> It'll also be slower (a division per iteration).
> >>>>>>>
> >>>>>> With one exception, every test that I've thrown at it shows that
> it's
> >>>>>> better for float32. That exception is uniformly spaced arrays, like
> >>>>>> linspace().
> >>>>>>
> >>>>>>  > You do avoid
> >>>>>>  > accumulating large sums, but then doing the division a[i]/len(a)
> >>>>>>  > and adding that will do the same.
> >>>>>>
> >>>>>> Okay, this is true.
> >>>>>>
> >>>>>>
> >>>>>>> Now, if you want to avoid losing precision, you want to use a
> better
> >>>>>>> summation technique, like compensated (or Kahan) summation:
> >>>>>>>
> >>>>>>> def mean(a):
> >>>>>>>     s = e = a.dtype.type(0)
> >>>>>>>     for i in range(0, len(a)):
> >>>>>>>         temp = s
> >>>>>>>         y = a[i] + e
> >>>>>>>         s = temp + y
> >>>>>>>         e = (temp - s) + y
> >>>>>>>     return s / len(a)
> >>>>>>>
> >>>>>>>
> >>>>>>>> def var(a):
> >>>>>>>>      m = a[0]
> >>>>>>>>      t = a.dtype.type(0)
> >>>>>>>>      for i in range(1, len(a)):
> >>>>>>>>          q = a[i] - m
> >>>>>>>>          r = q / (i+1)
> >>>>>>>>          m += r
> >>>>>>>>          t += i * q * r
> >>>>>>>>      t /= len(a)
> >>>>>>>>      return t
> >>>>>>>>
> >>>>>>>> Alternatively, from Knuth:
> >>>>>>>>
> >>>>>>>> def var_knuth(a):
> >>>>>>>>      m = a.dtype.type(0)
> >>>>>>>>      variance = a.dtype.type(0)
> >>>>>>>>      for i in range(len(a)):
> >>>>>>>>          delta = a[i] - m
> >>>>>>>>          m += delta / (i+1)
> >>>>>>>>          variance += delta * (a[i] - m)
> >>>>>>>>      variance /= len(a)
> >>>>>>>>      return variance
> >>>>>>>>
> >>>> I'm going to go ahead and attach a module containing the versions of
> >>>> mean, var, etc that I've been playing with in case someone wants to
> mess
> >>>> with them. Some were stolen from traffic on this list, for others I
> >>>> grabbed the algorithms from wikipedia or equivalent.
> >>>>
> >>> I looked into this a bit more. I checked float32 (single precision)
> and
> >>> float64 (double precision), using long doubles (float96) for the
> "exact"
> >>> results. This is based on your code. Results are compared using
> >>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values
> in
> >>> the range of [-100, 900]
> >>>
> >>> First, the mean. In float32, the Kahan summation in single precision
> is
> >>> better by about 2 orders of magnitude than simple summation. However,
> >>> accumulating the sum in double precision is better by about 9 orders
> of
> >>> magnitude than simple summation (7 orders more than Kahan).
> >>>
> >>> In float64, Kahan summation is the way to go, by 2 orders of
> magnitude.
> >>>
> >>> For the variance, in float32, Knuth's method is *no better* than the
> >>> two-pass method. Tim's code does an implicit conversion of
> intermediate
> >>> results to float64, which is why he saw a much better result.
> >>>
> >> Doh! And I fixed that same problem in the mean implementation earlier
> >> too. I was astounded by how good knuth was doing, but not astounded
> >> enough apparently.
> >>
> >> Does it seem weird to anyone else that in:
> >>     numpy_scalar <op> python_scalar
> >> the precision ends up being controlled by the python scalar? I would
> >> expect the numpy_scalar to control the resulting precision just like
> >> numpy arrays do in similar circumstances. Perhaps the egg on my face is
> >> just clouding my vision though.
> >>
> >>
> >>> The two-pass method using
> >>> Kahan summation (again, in single precision), is better by about 2
> orders
> >>> of magnitude. There is practically no difference when using a
> >>> double-precision accumulator amongst the techniques: they're all about
> 9
> >>> orders of magnitude better than single-precision two-pass.
> >>>
> >>> In float64, Kahan summation is again better than the rest, by about 2
> >>> orders of magnitude.
> >>>
> >>> I've put my adaptation of Tim's code, and box-and-whisker plots of the
> >>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
> >>>
> >>> Conclusions:
> >>>
> >>> - If you're going to calculate everything in single precision, use
> Kahan
> >>> summation. Using it in double-precision also helps.
> >>> - If you can use a double-precision accumulator, it's much better than
> >>> any of the techniques in single-precision only.
> >>>
> >>> - for speed+precision in the variance, either use Kahan summation in
> >>> single precision with the two-pass method, or use double precision
> with
> >>> simple summation with the two-pass method. Knuth buys you nothing,
> except
> >>> slower code :-)
> >>>
> >> The two pass methods are definitely more accurate. I won't be convinced
> >> on the speed front till I see comparable C implementations slug it out.
> >> That may well mean never in practice. However, I expect that somewhere
> >> around 10,000 items, the cache will overflow and memory bandwidth will
> >> become the bottleneck. At that point the extra operations of Knuth
> won't
> >> matter as much as making two passes through the array and Knuth will
> win
> >> on speed. Of course the accuracy is pretty bad at single precision, so
> >> the possible, theoretical speed advantage at large sizes probably
> >> doesn't matter.
> >>
> >> -tim
> >>
> >>
> >>> After 1.0 is out, we should look at doing one of the above.
> >>>
> >> +1
> >>
> >>
> > Hi,
> > I don't understand much of these "fancy algorithms", but does the above
> mean
> > that
> > after 1.0 is released the float32 functions will catch up in terms of
> > precision precision ("to some extend") with using dtype=float64 ?
> >
> It sounds like there is some consensus to do something to improve the
> precision after 1.0 comes out. I don't think the details are entirely
> nailed down, but it sounds like some combination of using Kahan
> summation and increasing the minimum precision of the accumulator is
> likely for sum, mean, var and std.


I would go with double as the default except when the base precision is
greater. Higher precisions should be available as a check, i.e., if the
results look funny use Kahan or extended to see if roundoff is a problem. I
think Kahan should be available because it adds precision to *any* base
precision, even extended or quad, so there is always something to check
against. But I don't think Kahan should be the default because of the speed
hit.

Chuck

Re: [Numpy-discussion] please change mean to use dtype=float

From: Tim H. <tim...@ie...> - 2006-09-22 19:18:34

Charles R Harris wrote:

[CHOP]
> On 9/22/06, *Tim Hochberg* <tim...@ie... 
> <mailto:tim...@ie...>> wrote:
>
>
>     It sounds like there is some consensus to do something to improve the
>     precision after 1.0 comes out. I don't think the details are entirely
>     nailed down, but it sounds like some combination of using Kahan
>     summation and increasing the minimum precision of the accumulator is
>     likely for sum, mean, var and std. 
>
>
> I would go with double as the default except when the base precision 
> is greater. Higher precisions should be available as a check, i.e., if 
> the results look funny use Kahan or extended to see if roundoff is a 
> problem. I think Kahan should be available because it adds precision 
> to *any* base precision, even extended or quad, so there is always 
> something to check against. But I don't think Kahan should be the 
> default because of the speed hit.
>
Note that add.reduce will be available for doing simple-fast-reductions, 
so I consider some speed hit as acceptable. I'll be interested to see 
what the actual speed difference between Kahan and straight summation 
actually are.  It may also be possible to unroll the core loop cleverly 
so that multiple ops can be scheduled on the FPU in parallel. It doesn't 
look like naive unrolling will work since each iteration depends on the 
values of the previous one. However, it probably doesn't matter much 
where the low order bits get added back in, so there's some flexibility 
in how the loop gets unrolled.

-tim

PS. Sorry about the untrimmed posts.

Re: [Numpy-discussion] please change mean to use dtype=float

From: Tim H. <tim...@ie...> - 2006-09-21 22:57:10

David M. Cooke wrote:
> On Thu, 21 Sep 2006 11:34:42 -0700
> Tim Hochberg <tim...@ie...> wrote:
>
>   
>> Tim Hochberg wrote:
>>     
>>> Robert Kern wrote:
>>>   
>>>       
>>>> David M. Cooke wrote:
>>>>   
>>>>     
>>>>         
>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
>>>>>     
>>>>>       
>>>>>           
>>>>>> Let me offer a third path: the algorithms used for .mean() and .var()
>>>>>> are substandard. There are much better incremental algorithms that
>>>>>> entirely avoid the need to accumulate such large (and therefore
>>>>>> precision-losing) intermediate values. The algorithms look like the
>>>>>> following for 1D arrays in Python:
>>>>>>
>>>>>> def mean(a):
>>>>>>      m = a[0]
>>>>>>      for i in range(1, len(a)):
>>>>>>          m += (a[i] - m) / (i + 1)
>>>>>>      return m
>>>>>>       
>>>>>>         
>>>>>>             
>>>>> This isn't really going to be any better than using a simple sum.
>>>>> It'll also be slower (a division per iteration).
>>>>>     
>>>>>       
>>>>>           
>>>> With one exception, every test that I've thrown at it shows that it's
>>>> better for float32. That exception is uniformly spaced arrays, like
>>>> linspace().
>>>>
>>>>  > You do avoid
>>>>  > accumulating large sums, but then doing the division a[i]/len(a) and
>>>>  > adding that will do the same.
>>>>
>>>> Okay, this is true.
>>>>
>>>>   
>>>>     
>>>>         
>>>>> Now, if you want to avoid losing precision, you want to use a better
>>>>> summation technique, like compensated (or Kahan) summation:
>>>>>
>>>>> def mean(a):
>>>>>     s = e = a.dtype.type(0)
>>>>>     for i in range(0, len(a)):
>>>>>         temp = s
>>>>>         y = a[i] + e
>>>>>         s = temp + y
>>>>>         e = (temp - s) + y
>>>>>     return s / len(a)
>>>>>           
>>>>     
>>>>         
>>>>>> def var(a):
>>>>>>      m = a[0]
>>>>>>      t = a.dtype.type(0)
>>>>>>      for i in range(1, len(a)):
>>>>>>          q = a[i] - m
>>>>>>          r = q / (i+1)
>>>>>>          m += r
>>>>>>          t += i * q * r
>>>>>>      t /= len(a)
>>>>>>      return t
>>>>>>
>>>>>> Alternatively, from Knuth:
>>>>>>
>>>>>> def var_knuth(a):
>>>>>>      m = a.dtype.type(0)
>>>>>>      variance = a.dtype.type(0)
>>>>>>      for i in range(len(a)):
>>>>>>          delta = a[i] - m
>>>>>>          m += delta / (i+1)
>>>>>>          variance += delta * (a[i] - m)
>>>>>>      variance /= len(a)
>>>>>>      return variance
>>>>>>             
>> I'm going to go ahead and attach a module containing the versions of 
>> mean, var, etc that I've been playing with in case someone wants to mess 
>> with them. Some were stolen from traffic on this list, for others I 
>> grabbed the algorithms from wikipedia or equivalent.
>>     
>
> I looked into this a bit more. I checked float32 (single precision) and
> float64 (double precision), using long doubles (float96) for the "exact"
> results. This is based on your code. Results are compared using
> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the
> range of [-100, 900]
>
> First, the mean. In float32, the Kahan summation in single precision is
> better by about 2 orders of magnitude than simple summation. However,
> accumulating the sum in double precision is better by about 9 orders of
> magnitude than simple summation (7 orders more than Kahan).
>
> In float64, Kahan summation is the way to go, by 2 orders of magnitude.
>
> For the variance, in float32, Knuth's method is *no better* than the two-pass
> method. Tim's code does an implicit conversion of intermediate results to
> float64, which is why he saw a much better result. The two-pass method using
> Kahan summation (again, in single precision), is better by about 2 orders of
> magnitude. There is practically no difference when using a double-precision
> accumulator amongst the techniques: they're all about 9 orders of magnitude
> better than single-precision two-pass.
>
> In float64, Kahan summation is again better than the rest, by about 2 orders
> of magnitude.
>
> I've put my adaptation of Tim's code, and box-and-whisker plots of the
> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
>
> Conclusions:
>
> - If you're going to calculate everything in single precision, use Kahan
> summation. Using it in double-precision also helps.
> - If you can use a double-precision accumulator, it's much better than any of
> the techniques in single-precision only.
>
> - for speed+precision in the variance, either use Kahan summation in single
> precision with the two-pass method, or use double precision with simple
> summation with the two-pass method. Knuth buys you nothing, except slower
> code :-)
>
> After 1.0 is out, we should look at doing one of the above.
>   
One more little tidbit; it appears possible to "fix up" Knuth's 
algorithm so that it's comparable in accuracy to the two pass Kahan 
version by doing Kahan summation while accumulating the variance. 
Testing on this was far from thorough, but in the tests I did it nearly 
always produced identical results to kahan. Of course this is even 
slower than the original Knuth version, but it's interesting anyway:


# This is probably messier than it need to be
# but I'm out of time for today...
def var_knuth2(values, dtype=default_prec):
    """var(values) -> variance of values computed using
       Knuth's one pass algorithm"""
    m = variance = mc = vc = dtype(0)
    for i, x in enumerate(values):
        delta = values[i] - m
        y = delta / dtype(i+1) + mc
        t = m + y
        mc = y - (t - m)
        m = t
        y =  delta * (x - m) + vc
        t = variance + y
        vc = y - (t - variance)
        variance = t
    assert type(variance) == dtype
    variance /= dtype(len(values))
    return variance

Re: [Numpy-discussion] please change mean to use dtype=float

From: Tim H. <tim...@ie...> - 2006-09-22 13:09:05

Tim Hochberg wrote:
> David M. Cooke wrote:
>   
>> On Thu, 21 Sep 2006 11:34:42 -0700
>> Tim Hochberg <tim...@ie...> wrote:
>>
>>   
>>     
>>> Tim Hochberg wrote:
>>>     
>>>       
>>>> Robert Kern wrote:
>>>>   
>>>>       
>>>>         
>>>>> David M. Cooke wrote:
>>>>>   
>>>>>     
>>>>>         
>>>>>           
>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
>>>>>>     
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> Let me offer a third path: the algorithms used for .mean() and .var()
>>>>>>> are substandard. There are much better incremental algorithms that
>>>>>>> entirely avoid the need to accumulate such large (and therefore
>>>>>>> precision-losing) intermediate values. The algorithms look like the
>>>>>>> following for 1D arrays in Python:
>>>>>>>
>>>>>>> def mean(a):
>>>>>>>      m = a[0]
>>>>>>>      for i in range(1, len(a)):
>>>>>>>          m += (a[i] - m) / (i + 1)
>>>>>>>      return m
>>>>>>>       
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>> This isn't really going to be any better than using a simple sum.
>>>>>> It'll also be slower (a division per iteration).
>>>>>>     
>>>>>>       
>>>>>>           
>>>>>>             
>>>>> With one exception, every test that I've thrown at it shows that it's
>>>>> better for float32. That exception is uniformly spaced arrays, like
>>>>> linspace().
>>>>>
>>>>>  > You do avoid
>>>>>  > accumulating large sums, but then doing the division a[i]/len(a) and
>>>>>  > adding that will do the same.
>>>>>
>>>>> Okay, this is true.
>>>>>
>>>>>   
>>>>>     
>>>>>         
>>>>>           
>>>>>> Now, if you want to avoid losing precision, you want to use a better
>>>>>> summation technique, like compensated (or Kahan) summation:
>>>>>>
>>>>>> def mean(a):
>>>>>>     s = e = a.dtype.type(0)
>>>>>>     for i in range(0, len(a)):
>>>>>>         temp = s
>>>>>>         y = a[i] + e
>>>>>>         s = temp + y
>>>>>>         e = (temp - s) + y
>>>>>>     return s / len(a)
>>>>>>           
>>>>>>             
>>>>>     
>>>>>         
>>>>>           
>>>>>>> def var(a):
>>>>>>>      m = a[0]
>>>>>>>      t = a.dtype.type(0)
>>>>>>>      for i in range(1, len(a)):
>>>>>>>          q = a[i] - m
>>>>>>>          r = q / (i+1)
>>>>>>>          m += r
>>>>>>>          t += i * q * r
>>>>>>>      t /= len(a)
>>>>>>>      return t
>>>>>>>
>>>>>>> Alternatively, from Knuth:
>>>>>>>
>>>>>>> def var_knuth(a):
>>>>>>>      m = a.dtype.type(0)
>>>>>>>      variance = a.dtype.type(0)
>>>>>>>      for i in range(len(a)):
>>>>>>>          delta = a[i] - m
>>>>>>>          m += delta / (i+1)
>>>>>>>          variance += delta * (a[i] - m)
>>>>>>>      variance /= len(a)
>>>>>>>      return variance
>>>>>>>             
>>>>>>>               
>>> I'm going to go ahead and attach a module containing the versions of 
>>> mean, var, etc that I've been playing with in case someone wants to mess 
>>> with them. Some were stolen from traffic on this list, for others I 
>>> grabbed the algorithms from wikipedia or equivalent.
>>>     
>>>       
>> I looked into this a bit more. I checked float32 (single precision) and
>> float64 (double precision), using long doubles (float96) for the "exact"
>> results. This is based on your code. Results are compared using
>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the
>> range of [-100, 900]
>>
>> First, the mean. In float32, the Kahan summation in single precision is
>> better by about 2 orders of magnitude than simple summation. However,
>> accumulating the sum in double precision is better by about 9 orders of
>> magnitude than simple summation (7 orders more than Kahan).
>>
>> In float64, Kahan summation is the way to go, by 2 orders of magnitude.
>>
>> For the variance, in float32, Knuth's method is *no better* than the two-pass
>> method. Tim's code does an implicit conversion of intermediate results to
>> float64, which is why he saw a much better result. 
>>     
> Doh! And I fixed that same problem in the mean implementation earlier 
> too. I was astounded by how good knuth was doing, but not astounded 
> enough apparently.
>
> Does it seem weird to anyone else that in:
>     numpy_scalar <op> python_scalar
> the precision ends up being controlled by the python scalar? I would 
> expect the numpy_scalar to control the resulting precision just like 
> numpy arrays do in similar circumstances. Perhaps the egg on my face is 
> just clouding my vision though.
>
>   
>> The two-pass method using
>> Kahan summation (again, in single precision), is better by about 2 orders of
>> magnitude. There is practically no difference when using a double-precision
>> accumulator amongst the techniques: they're all about 9 orders of magnitude
>> better than single-precision two-pass.
>>
>> In float64, Kahan summation is again better than the rest, by about 2 orders
>> of magnitude.
>>
>> I've put my adaptation of Tim's code, and box-and-whisker plots of the
>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
>>
>> Conclusions:
>>
>> - If you're going to calculate everything in single precision, use Kahan
>> summation. Using it in double-precision also helps.
>> - If you can use a double-precision accumulator, it's much better than any of
>> the techniques in single-precision only.
>>
>> - for speed+precision in the variance, either use Kahan summation in single
>> precision with the two-pass method, or use double precision with simple
>> summation with the two-pass method. Knuth buys you nothing, except slower
>> code :-)
>>   
>>     
> The two pass methods are definitely more accurate. I won't be convinced 
> on the speed front till I see comparable C implementations slug it out. 
> That may well mean never in practice. However, I expect that somewhere 
> around 10,000 items, the cache will overflow and memory bandwidth will 
> become the bottleneck. At that point the extra operations of Knuth won't 
> matter as much as making two passes through the array and Knuth will win 
> on speed. 
OK, let me take this back. I looked at this speed effect a little more 
and the effect is smaller than I remember. For example, "a+=a" slows 
down by about a factor of 2.5 on my box between 10,000 and 100,0000 
elements. That's not insignificant, but assuming (naively) that this 
means that memory effects account to the equivalent of 1.5 add 
operations, this doesn't come close to closing to gap between Knuth and 
the two pass approach. The other thing is that while a+=a and similar 
operations show this behaviour, a.sum() and add.reduce(a) do not, 
hinting that perhaps it's only writing to memory that is a bottleneck. 
Or perhaps hinting that my mental model of what's going on here is badly 
flawed.

So, +1 for me on Kahan summation for computing means and two pass with 
Kahan summation for variances. It would probably be nice to expose the 
Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for 
the latter in summing a bunch of disparate matrices with high precision.

I'm on the fence on using the array dtype for the accumulator dtype 
versus always using at least double precision for the accumulator. The 
former is easier to explain and is probably faster, but the latter is a 
lot more accuracy for basically free. It depends on how the relative 
speeds shake out I suppose. If the speed of using float64 is comparable 
to that of using float32, we might as well. One thing I'm not on the 
fence about is the return type: it should always match the input type, 
or dtype if that is specified. Otherwise one gets mysterious, gratuitous 
upcasting.

On the subject of upcasting, I'm still skeptical of the behavior of 
mixed numpy-scalar and python-scalar ops. Since numpy-scalars are 
generally the results of indexing operations, not literals, I think that 
they should behave like arrays for purposes of determining the resulting 
precision, not like python-scalars. Otherwise situations arise where the 
rank of the input array determine the kind of output (float32 versus 
float64 for example) since if the array is 1D indexing into the array 
yields numpy-scalars which then get promoted when they interact with 
python-scalars.However if the array is 2D, indexing yields a vector, 
which is not promoted when interacting with python scalars and the 
precision remains fixed.


-tim



> Of course the accuracy is pretty bad at single precision, so 
> the possible, theoretical speed advantage at large sizes probably 
> doesn't matter.
>
> -tim
>
>   
>> After 1.0 is out, we should look at doing one of the above.
>>   
>>     
>
> +1
>
>
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>   







m   bnb

Re: [Numpy-discussion] please change mean to use dtype=float

From: Charles R H. <cha...@gm...> - 2006-09-22 14:43:59

Hi,

On 9/22/06, Tim Hochberg <tim...@ie...> wrote:
>
> Tim Hochberg wrote:
> > David M. Cooke wrote:
> >
> >> On Thu, 21 Sep 2006 11:34:42 -0700
> >> Tim Hochberg <tim...@ie...> wrote:
> >>
> >>
> >>
> >>> Tim Hochberg wrote:
> >>>
> >>>
> >>>> Robert Kern wrote:
> >>>>
> >>>>
> >>>>
> >>>>> David M. Cooke wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Let me offer a third path: the algorithms used for .mean() and
> .var()
> >>>>>>> are substandard. There are much better incremental algorithms that
> >>>>>>> entirely avoid the need to accumulate such large (and therefore
> >>>>>>> precision-losing) intermediate values. The algorithms look like
> the
> >>>>>>> following for 1D arrays in Python:
> >>>>>>>
> >>>>>>> def mean(a):
> >>>>>>>      m = a[0]
> >>>>>>>      for i in range(1, len(a)):
> >>>>>>>          m += (a[i] - m) / (i + 1)
> >>>>>>>      return m
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> This isn't really going to be any better than using a simple sum.
> >>>>>> It'll also be slower (a division per iteration).
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> With one exception, every test that I've thrown at it shows that
> it's
> >>>>> better for float32. That exception is uniformly spaced arrays, like
> >>>>> linspace().
> >>>>>
> >>>>>  > You do avoid
> >>>>>  > accumulating large sums, but then doing the division a[i]/len(a)
> and
> >>>>>  > adding that will do the same.
> >>>>>
> >>>>> Okay, this is true.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Now, if you want to avoid losing precision, you want to use a
> better
> >>>>>> summation technique, like compensated (or Kahan) summation:
> >>>>>>
> >>>>>> def mean(a):
> >>>>>>     s = e = a.dtype.type(0)
> >>>>>>     for i in range(0, len(a)):
> >>>>>>         temp = s
> >>>>>>         y = a[i] + e
> >>>>>>         s = temp + y
> >>>>>>         e = (temp - s) + y
> >>>>>>     return s / len(a)
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> def var(a):
> >>>>>>>      m = a[0]
> >>>>>>>      t = a.dtype.type(0)
> >>>>>>>      for i in range(1, len(a)):
> >>>>>>>          q = a[i] - m
> >>>>>>>          r = q / (i+1)
> >>>>>>>          m += r
> >>>>>>>          t += i * q * r
> >>>>>>>      t /= len(a)
> >>>>>>>      return t
> >>>>>>>
> >>>>>>> Alternatively, from Knuth:
> >>>>>>>
> >>>>>>> def var_knuth(a):
> >>>>>>>      m = a.dtype.type(0)
> >>>>>>>      variance = a.dtype.type(0)
> >>>>>>>      for i in range(len(a)):
> >>>>>>>          delta = a[i] - m
> >>>>>>>          m += delta / (i+1)
> >>>>>>>          variance += delta * (a[i] - m)
> >>>>>>>      variance /= len(a)
> >>>>>>>      return variance
> >>>>>>>
> >>>>>>>
> >>> I'm going to go ahead and attach a module containing the versions of
> >>> mean, var, etc that I've been playing with in case someone wants to
> mess
> >>> with them. Some were stolen from traffic on this list, for others I
> >>> grabbed the algorithms from wikipedia or equivalent.
> >>>
> >>>
> >> I looked into this a bit more. I checked float32 (single precision) and
> >> float64 (double precision), using long doubles (float96) for the
> "exact"
> >> results. This is based on your code. Results are compared using
> >> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values
> in the
> >> range of [-100, 900]
> >>
> >> First, the mean. In float32, the Kahan summation in single precision is
> >> better by about 2 orders of magnitude than simple summation. However,
> >> accumulating the sum in double precision is better by about 9 orders of
> >> magnitude than simple summation (7 orders more than Kahan).
> >>
> >> In float64, Kahan summation is the way to go, by 2 orders of magnitude.
> >>
> >> For the variance, in float32, Knuth's method is *no better* than the
> two-pass
> >> method. Tim's code does an implicit conversion of intermediate results
> to
> >> float64, which is why he saw a much better result.
> >>
> > Doh! And I fixed that same problem in the mean implementation earlier
> > too. I was astounded by how good knuth was doing, but not astounded
> > enough apparently.
> >
> > Does it seem weird to anyone else that in:
> >     numpy_scalar <op> python_scalar
> > the precision ends up being controlled by the python scalar? I would
> > expect the numpy_scalar to control the resulting precision just like
> > numpy arrays do in similar circumstances. Perhaps the egg on my face is
> > just clouding my vision though.
> >
> >
> >> The two-pass method using
> >> Kahan summation (again, in single precision), is better by about 2
> orders of
> >> magnitude. There is practically no difference when using a
> double-precision
> >> accumulator amongst the techniques: they're all about 9 orders of
> magnitude
> >> better than single-precision two-pass.
> >>
> >> In float64, Kahan summation is again better than the rest, by about 2
> orders
> >> of magnitude.
> >>
> >> I've put my adaptation of Tim's code, and box-and-whisker plots of the
> >> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
> >>
> >> Conclusions:
> >>
> >> - If you're going to calculate everything in single precision, use
> Kahan
> >> summation. Using it in double-precision also helps.
> >> - If you can use a double-precision accumulator, it's much better than
> any of
> >> the techniques in single-precision only.
> >>
> >> - for speed+precision in the variance, either use Kahan summation in
> single
> >> precision with the two-pass method, or use double precision with simple
> >> summation with the two-pass method. Knuth buys you nothing, except
> slower
> >> code :-)
> >>
> >>
> > The two pass methods are definitely more accurate. I won't be convinced
> > on the speed front till I see comparable C implementations slug it out.
> > That may well mean never in practice. However, I expect that somewhere
> > around 10,000 items, the cache will overflow and memory bandwidth will
> > become the bottleneck. At that point the extra operations of Knuth won't
> > matter as much as making two passes through the array and Knuth will win
> > on speed.
> OK, let me take this back. I looked at this speed effect a little more
> and the effect is smaller than I remember. For example, "a+=a" slows
> down by about a factor of 2.5 on my box between 10,000 and 100,0000
> elements. That's not insignificant, but assuming (naively) that this
> means that memory effects account to the equivalent of 1.5 add
> operations, this doesn't come close to closing to gap between Knuth and
> the two pass approach. The other thing is that while a+=a and similar
> operations show this behaviour, a.sum() and add.reduce(a) do not,
> hinting that perhaps it's only writing to memory that is a bottleneck.
> Or perhaps hinting that my mental model of what's going on here is badly
> flawed.
>
> So, +1 for me on Kahan summation for computing means and two pass with
> Kahan summation for variances. It would probably be nice to expose the
> Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for
> the latter in summing a bunch of disparate matrices with high precision.
>
> I'm on the fence on using the array dtype for the accumulator dtype
> versus always using at least double precision for the accumulator.


I keep going back and forth on this myself. So I ask myself what I do in
practice. In practice, I accumulate in doubles, sometimes extended
precision. I only think of accumulating in floats when the number of items
is small and computation might be noticeably faster -- and these days it
seldom is. The days of the Vax and a 2x speed difference are gone, now we
generally have FPUs whose base type is double with float tacked on as an
afterthought. The only lesson from those days that remains pertinent is:
never, ever, do division.

So in practice, I would go with doubles. The only good reason to use floats
for floats is a sort of pure consistency.

Chuck

Re: [Numpy-discussion] please change mean to use dtype=float

From: Tim H. <tim...@ie...> - 2006-09-22 16:06:36

Sebastian Haase wrote:
> On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
>   
>> David M. Cooke wrote:
>>     
>>> On Thu, 21 Sep 2006 11:34:42 -0700
>>>
>>> Tim Hochberg <tim...@ie...> wrote:
>>>       
>>>> Tim Hochberg wrote:
>>>>         
>>>>> Robert Kern wrote:
>>>>>           
>>>>>> David M. Cooke wrote:
>>>>>>             
>>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
>>>>>>>               
>>>>>>>> Let me offer a third path: the algorithms used for .mean() and
>>>>>>>> .var() are substandard. There are much better incremental algorithms
>>>>>>>> that entirely avoid the need to accumulate such large (and therefore
>>>>>>>> precision-losing) intermediate values. The algorithms look like the
>>>>>>>> following for 1D arrays in Python:
>>>>>>>>
>>>>>>>> def mean(a):
>>>>>>>>      m = a[0]
>>>>>>>>      for i in range(1, len(a)):
>>>>>>>>          m += (a[i] - m) / (i + 1)
>>>>>>>>      return m
>>>>>>>>                 
>>>>>>> This isn't really going to be any better than using a simple sum.
>>>>>>> It'll also be slower (a division per iteration).
>>>>>>>               
>>>>>> With one exception, every test that I've thrown at it shows that it's
>>>>>> better for float32. That exception is uniformly spaced arrays, like
>>>>>> linspace().
>>>>>>
>>>>>>  > You do avoid
>>>>>>  > accumulating large sums, but then doing the division a[i]/len(a)
>>>>>>  > and adding that will do the same.
>>>>>>
>>>>>> Okay, this is true.
>>>>>>
>>>>>>             
>>>>>>> Now, if you want to avoid losing precision, you want to use a better
>>>>>>> summation technique, like compensated (or Kahan) summation:
>>>>>>>
>>>>>>> def mean(a):
>>>>>>>     s = e = a.dtype.type(0)
>>>>>>>     for i in range(0, len(a)):
>>>>>>>         temp = s
>>>>>>>         y = a[i] + e
>>>>>>>         s = temp + y
>>>>>>>         e = (temp - s) + y
>>>>>>>     return s / len(a)
>>>>>>>
>>>>>>>               
>>>>>>>> def var(a):
>>>>>>>>      m = a[0]
>>>>>>>>      t = a.dtype.type(0)
>>>>>>>>      for i in range(1, len(a)):
>>>>>>>>          q = a[i] - m
>>>>>>>>          r = q / (i+1)
>>>>>>>>          m += r
>>>>>>>>          t += i * q * r
>>>>>>>>      t /= len(a)
>>>>>>>>      return t
>>>>>>>>
>>>>>>>> Alternatively, from Knuth:
>>>>>>>>
>>>>>>>> def var_knuth(a):
>>>>>>>>      m = a.dtype.type(0)
>>>>>>>>      variance = a.dtype.type(0)
>>>>>>>>      for i in range(len(a)):
>>>>>>>>          delta = a[i] - m
>>>>>>>>          m += delta / (i+1)
>>>>>>>>          variance += delta * (a[i] - m)
>>>>>>>>      variance /= len(a)
>>>>>>>>      return variance
>>>>>>>>                 
>>>> I'm going to go ahead and attach a module containing the versions of
>>>> mean, var, etc that I've been playing with in case someone wants to mess
>>>> with them. Some were stolen from traffic on this list, for others I
>>>> grabbed the algorithms from wikipedia or equivalent.
>>>>         
>>> I looked into this a bit more. I checked float32 (single precision) and
>>> float64 (double precision), using long doubles (float96) for the "exact"
>>> results. This is based on your code. Results are compared using
>>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in
>>> the range of [-100, 900]
>>>
>>> First, the mean. In float32, the Kahan summation in single precision is
>>> better by about 2 orders of magnitude than simple summation. However,
>>> accumulating the sum in double precision is better by about 9 orders of
>>> magnitude than simple summation (7 orders more than Kahan).
>>>
>>> In float64, Kahan summation is the way to go, by 2 orders of magnitude.
>>>
>>> For the variance, in float32, Knuth's method is *no better* than the
>>> two-pass method. Tim's code does an implicit conversion of intermediate
>>> results to float64, which is why he saw a much better result.
>>>       
>> Doh! And I fixed that same problem in the mean implementation earlier
>> too. I was astounded by how good knuth was doing, but not astounded
>> enough apparently.
>>
>> Does it seem weird to anyone else that in:
>>     numpy_scalar <op> python_scalar
>> the precision ends up being controlled by the python scalar? I would
>> expect the numpy_scalar to control the resulting precision just like
>> numpy arrays do in similar circumstances. Perhaps the egg on my face is
>> just clouding my vision though.
>>
>>     
>>> The two-pass method using
>>> Kahan summation (again, in single precision), is better by about 2 orders
>>> of magnitude. There is practically no difference when using a
>>> double-precision accumulator amongst the techniques: they're all about 9
>>> orders of magnitude better than single-precision two-pass.
>>>
>>> In float64, Kahan summation is again better than the rest, by about 2
>>> orders of magnitude.
>>>
>>> I've put my adaptation of Tim's code, and box-and-whisker plots of the
>>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
>>>
>>> Conclusions:
>>>
>>> - If you're going to calculate everything in single precision, use Kahan
>>> summation. Using it in double-precision also helps.
>>> - If you can use a double-precision accumulator, it's much better than
>>> any of the techniques in single-precision only.
>>>
>>> - for speed+precision in the variance, either use Kahan summation in
>>> single precision with the two-pass method, or use double precision with
>>> simple summation with the two-pass method. Knuth buys you nothing, except
>>> slower code :-)
>>>       
>> The two pass methods are definitely more accurate. I won't be convinced
>> on the speed front till I see comparable C implementations slug it out.
>> That may well mean never in practice. However, I expect that somewhere
>> around 10,000 items, the cache will overflow and memory bandwidth will
>> become the bottleneck. At that point the extra operations of Knuth won't
>> matter as much as making two passes through the array and Knuth will win
>> on speed. Of course the accuracy is pretty bad at single precision, so
>> the possible, theoretical speed advantage at large sizes probably
>> doesn't matter.
>>
>> -tim
>>
>>     
>>> After 1.0 is out, we should look at doing one of the above.
>>>       
>> +1
>>
>>     
> Hi,
> I don't understand much of these "fancy algorithms", but does the above mean 
> that 
> after 1.0 is released the float32 functions will catch up in terms of 
> precision precision ("to some extend") with using dtype=float64 ?
>   
It sounds like there is some consensus to do something to improve the 
precision after 1.0 comes out. I don't think the details are entirely 
nailed down, but it sounds like some combination of using Kahan 
summation and increasing the minimum precision of the accumulator is 
likely for sum, mean, var and std.

-tim

Re: [Numpy-discussion] please change mean to use dtype=float

From: Christopher B. <Chr...@no...> - 2006-09-22 16:34:51

Tim Hochberg wrote:
> It would probably be nice to expose the 
> Kahan sum and maybe even the raw_kahan_sum somewhere.

What about using it for .sum() by default? What is the speed hit anyway? 
In any case, having it available would be nice.

> I'm on the fence on using the array dtype for the accumulator dtype 
> versus always using at least double precision for the accumulator. The 
> former is easier to explain and is probably faster, but the latter is a 
> lot more accuracy for basically free.

I don't think the difficulty of explanation is a big issue at all -- I'm 
having a really hard time imagining someone getting confused and/or 
disappointed that their single precision calculation didn't overflow or 
was more accurate than expected. Anyone that did, would know enough to 
understand the explanation.

In general, users expect things to "just work". They only dig into the 
details when something goes wrong, like the mean of a bunch of positive 
integers coming out as negative (wasn't that the OP's example?). The 
fewer such instance we have, the fewer questions we have.

> speeds shake out I suppose. If the speed of using float64 is comparable 
> to that of using float32, we might as well.

Only testing will tell, but my experience is that with double precision 
FPUs, doubles are just as fast as floats unless you're dealing with 
enough memory to make a difference in caching. In this case, only the 
accumulator is double, so that wouldn't be an issue. I suppose the float 
to double conversion could take some time though...

> One thing I'm not on the 
> fence about is the return type: it should always match the input type, 
> or dtype if that is specified.

+1

> Since numpy-scalars are 
> generally the results of indexing operations, not literals, I think that 
> they should behave like arrays for purposes of determining the resulting 
> precision, not like python-scalars.

+1

>> Of course the accuracy is pretty bad at single precision, so 
>> the possible, theoretical speed advantage at large sizes probably 
>> doesn't matter.

good point. the larger the array -- the more accuracy matters.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [Numpy-discussion] please change mean to use dtype=float

From: Charles R H. <cha...@gm...> - 2006-09-20 03:43:02

On 9/19/06, Charles R Harris <cha...@gm...> wrote:
>
>
>
> On 9/19/06, Sebastian Haase <ha...@ms...> wrote:
> >
> > Travis Oliphant wrote:
> > > Sebastian Haase wrote:
> > >> I still would argue that getting a "good" (smaller rounding errors)
> > answer
> > >> should be the default -- if speed is wanted, then *that* could be
> > still
> > >> specified by explicitly using dtype=float32  (which would also be a
> > possible
> > >> choice for int32 input) .
> > >>
> > > So you are arguing for using long double then.... ;-)
> > >
> > >> In image processing we always want means to be calculated in float64
> > even
> > >> though input data is always float32 (if not uint16).
> > >>
> > >> Also it is simpler to say "float64 is the default" (full stop.) -
> > instead
> > >>
> > >> "float64 is the default unless you have float32"
> > >>
> > > "the type you have is the default except for integers".  Do you really
> > > want float64 to be the default for float96?
> > >
> > > Unless we are going to use long double as the default, then I'm not
> > > convinced that we should special-case the "double" type.
> > >
> > I guess I'm not really aware of the float96 type ...
> > Is that a "machine type" on any system ?  I always thought that -- e.g .
> > coming from C -- double is "as good as it gets"...
> > Who uses float96 ?  I heard somewhere that (some) CPUs use 80bits
> > internally when calculating 64bit double-precision...
> >
> > Is this not going into some academic argument !?
> > For all I know, calculating mean()s (and sum()s, ...) is always done in
> > double precision -- never in single precision, even when the data is in
> > float32.
> >
> > Having float32 be the default for float32 data is just requiring more
> > typing, and more explaining ...  it would compromise numpy usability as
> > a day-to-day replacement for other systems.
> >
> > Sorry, if I'm being ignorant ...
>
>
> I'm going to side with Travis here. It is only a default and easily
> overridden. And yes, there are precisions greater than double. I was using
> quad precision back in the eighties on a VAX for some inherently ill
> conditioned problems. And on my machine long double is 12 bytes.
>

Here is the 754r (revision) spec: http://en.wikipedia.org/wiki/IEEE_754r

It includes quads (128 bits) and half precision (16 bits) floats. I believe
the latter are used for some imaging stuff, radar for instance, and are also
available in some high end GPUs from Nvidia and other companies. The 80 bit
numbers you refer to were defined as extended precision in the original 754
spec and were mostly intended for temporaries in internal FPU computations.
They have various alignment requirements for efficient use, which is why
they show up as 96 bits (4 byte alignment) and sometimes 128 bits (8 byte
alignment). So actually, float128 would not always distinquish between
extended precision and quad precision.   I see more work for Travis in the
future ;)

Chuck

Re: [Numpy-discussion] please change mean to use dtype=float

From: Charles R H. <cha...@gm...> - 2006-09-20 04:18:55

On 9/19/06, Charles R Harris <cha...@gm...> wrote:
>
>
>
> On 9/19/06, Charles R Harris <cha...@gm...> wrote:
> >
> >
> >
> > On 9/19/06, Sebastian Haase < ha...@ms...> wrote:
> > >
> > > Travis Oliphant wrote:
> > > > Sebastian Haase wrote:
> > > >> I still would argue that getting a "good" (smaller rounding errors)
> > > answer
> > > >> should be the default -- if speed is wanted, then *that* could be
> > > still
> > > >> specified by explicitly using dtype=float32  (which would also be a
> > > possible
> > > >> choice for int32 input) .
> > > >>
> > > > So you are arguing for using long double then.... ;-)
> > > >
> > > >> In image processing we always want means to be calculated in
> > > float64 even
> > > >> though input data is always float32 (if not uint16).
> > > >>
> > > >> Also it is simpler to say "float64 is the default" (full stop.) -
> > > instead
> > > >>
> > > >> "float64 is the default unless you have float32"
> > > >>
> > > > "the type you have is the default except for integers".  Do you
> > > really
> > > > want float64 to be the default for float96?
> > > >
> > > > Unless we are going to use long double as the default, then I'm not
> > > > convinced that we should special-case the "double" type.
> > > >
> > > I guess I'm not really aware of the float96 type ...
> > > Is that a "machine type" on any system ?  I always thought that -- e.g.
> > > coming from C -- double is "as good as it gets"...
> > > Who uses float96 ?  I heard somewhere that (some) CPUs use 80bits
> > > internally when calculating 64bit double-precision...
> > >
> > > Is this not going into some academic argument !?
> > > For all I know, calculating mean()s (and sum()s, ...) is always done
> > > in
> > > double precision -- never in single precision, even when the data is
> > > in
> > > float32.
> > >
> > > Having float32 be the default for float32 data is just requiring more
> > > typing, and more explaining ...  it would compromise numpy usability
> > > as
> > > a day-to-day replacement for other systems.
> > >
> > > Sorry, if I'm being ignorant ...
> >
> >
> > I'm going to side with Travis here. It is only a default and easily
> > overridden. And yes, there are precisions greater than double. I was using
> > quad precision back in the eighties on a VAX for some inherently ill
> > conditioned problems. And on my machine long double is 12 bytes.
> >
>
> Here is the 754r (revision) spec: http://en.wikipedia.org/wiki/IEEE_754r
>
> It includes quads (128 bits) and half precision (16 bits) floats. I
> believe the latter are used for some imaging stuff, radar for instance, and
> are also available in some high end GPUs from Nvidia and other companies.
> The 80 bit numbers you refer to were defined as extended precision in the
> original 754 spec and were mostly intended for temporaries in internal FPU
> computations. They have various alignment requirements for efficient use,
> which is why they show up as 96 bits (4 byte alignment) and sometimes 128
> bits (8 byte alignment). So actually, float128 would not always distinquish
> between extended precision and quad precision.   I see more work for Travis
> in the future ;)
>

I just checked this out. On amd64 32 bit linux gives 12 bytes for long
double, 64 bit linux gives 16 bytes for long doubles, but they both have 64
bit mantissas, i.e., they are both 80 bit extended precision. Those sizes
are the defaults and can be overridden by compiler flags. Anyway, we may
need some way to tell the difference between float128 and quads since they
will both have the same length on 64 bit architectures. But that is a
problem for the future.

Chuck

Re: [Numpy-discussion] please change mean to use dtype=float

From: Christopher B. <Chr...@no...> - 2006-09-20 16:57:08

Sebastian Haase wrote:
> The best I can hope for is a "sound" default for most (practical) cases...
> I still think that 80bit vs. 128bit vs 96bit is rather academic for most 
> people ...  most people seem to only use float64 and then there are some 
> that use float32 (like us) ...

I fully agree with Sebastian here. As Travis pointed out, "all we are 
talking about is the default". The default should follow the principle 
of least surprise for the less-knowledgeable users. Personally, I try to 
always use doubles, unless I have a real reason not to. The recent 
change of default types for zeros et al. will help.

clearly, there is a problem to say the default accumulator for *all* 
types is double, as you wouldn't want to downcast float128s, even if 
they are obscure. However, is it that hard to say that the default 
accumulator will have *at least* the precision of double?

Robert Kern wrote:
> Let me offer a third path: the algorithms used for .mean() and .var() are 
> substandard. There are much better incremental algorithms that entirely avoid 
> the need to accumulate such large (and therefore precision-losing) intermediate 
> values.

This, of course, is an even better solution, unless there are 
substantial performance impacts.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

<< < 1 2 (Page 2 of 2)