From: Tim H. <tim...@ie...> - 2006-09-20 16:08:38
|
Robert Kern wrote: > Sebastian Haase wrote: > >> Robert Kern wrote: >> >>> Sebastian Haase wrote: >>> >>>> I know that having too much knowledge of the details often makes one >>>> forget what the "newcomers" will do and expect. >>>> >>> Please be more careful with such accusations. Repeated frequently, they can >>> become quite insulting. >>> >>> >> I did not mean to insult anyone - what I meant was, that I'm for numpy >> becoming an easy platform to use. I have spend and enjoyed part the last >> four years developing and evangelizing Python as an alternative to >> Matlab and C/Fortran based image analysis environment. I often find >> myself arguing for good support of the single precision data format. So >> I find it actually somewhat ironic to see myself arguing now for wanting >> float64 over float32 ;-) >> > > No one is doubting that you want numpy to be easy to use. Please don't doubt > that the rest of us want otherwise. However, the fact that you *want* numpy to > be easy to use does not mean that your suggestions *will* make numpy easy to use. > > We haven't forgotten what newcomers will do; to the contrary, we are quite aware > that new users need consistent behavior in order to learn how to use a system. > Adding another special case in how dtypes implicitly convert to one another will > impede new users being able to understand the whole system. See A. M. > Archibald's question in the thread "ufunc.reduce and conversion" for an example. > In our judgement this is a worse outcome than notational convenience for float32 > users, who already need to be aware of the effects of their precision choice. > Each of us can come to different conclusions in good faith without one of us > forgetting the new user experience. > > Let me offer a third path: the algorithms used for .mean() and .var() are > substandard. There are much better incremental algorithms that entirely avoid > the need to accumulate such large (and therefore precision-losing) intermediate > values. The algorithms look like the following for 1D arrays in Python: > > def mean(a): > m = a[0] > for i in range(1, len(a)): > m += (a[i] - m) / (i + 1) > return m > > def var(a): > m = a[0] > t = a.dtype.type(0) > for i in range(1, len(a)): > q = a[i] - m > r = q / (i+1) > m += r > t += i * q * r > t /= len(a) > return t > > Alternatively, from Knuth: > > def var_knuth(a): > m = a.dtype.type(0) > variance = a.dtype.type(0) > for i in range(len(a)): > delta = a[i] - m > m += delta / (i+1) > variance += delta * (a[i] - m) > variance /= len(a) > return variance > > If you will code up implementations of these for ndarray.mean() and > ndarray.var(), I will check them in and then float32 arrays will avoid most of > the catastrophes that the current implementations run into. > +1 > >>>> We are only talking >>>> about people that will a) work with single-precision data (e.g. large >>>> scale-image analysis) and who b) will tend to "just use the default" >>>> (dtype) --- How else can I say this: these people will just assume that >>>> arr.mean() *is* the mean of arr. >>>> >>> I don't understand what you mean, here. arr.mean() is almost never *the* mean of >>> arr. Double precision can't fix that. >>> >>> >> This was not supposed to be a scientific statement -- I'm (again) >> thinking of our students that not always appreciate the full complexity >> of computational numerics and data types and such. >> > > They need to appreciate the complexity of computational numerics if they are > going to do numerical computation. Double precision does not make it any simpler. > > |