You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Matthew B. <mat...@gm...> - 2006-09-23 11:37:01
|
Hi, Forgive my ignorance, but why is this? In [1]:from numpy import * In [2]:if not dtype('<f8'): ...: print 'Truth value is no' ...: ...: Truth value is no Best, Matthew |
From: Bill B. <wb...@gm...> - 2006-09-23 03:53:04
|
Thanks for the explanations, folks, I thought that might be the case. On 9/23/06, Sebastian Haase <ha...@ms...> wrote: > I'm working with "stacks of 2d arrays" -- but I was always under the > impression that -- since the last axis is the "fastest" -- "stacks of > <something>" should stack in the first dimension -- not the last --so that > the members of the stack still remain contiguous is memory. I also don't see why you'd want to stack matrices along the last axis in numpy. C-style memory order is one reason not to, but printing as well seems to favor stacking matrices along the first (0th) axis instead of the last. > > Is there a general consensus on how one should look at this ? Or are there > multiple incompatible view -- maybe coming from different fields -- ? I've beed stacking along the first axis myself. Perhaps the original folks who implemented atleast_3d were coming from Matlab, where stacking is generally done along the last axis. But everything in Matlab is Fortran order under the hood so it makes sense there. Anyway, glad to know it was just the implementer's idiosyncracy rather than some deep secret of the universe that I wasn't aware of. I sometimes wonder what sort of overhead is induced by numpy using primarily C-style while many of the underlying algorithms we use are in Fortran. I suspect f2py has to do a lot of copying to get things in the right order for making the calls to Fortran. --bb |
From: Travis O. <oli...@ie...> - 2006-09-23 03:31:35
|
Christian Kristukat wrote: >>> Ok. Does axis=None then mean, that take(a, ind) operates on the >>> flattened array? >>> Yes, that is correct. > Sorry, I never really read about what are ufuncs. I thought those are class > methods of the ndarray objects... Anyway, I was refering to the following > difference: > > In [7]: a > Out[7]: > array([[ 0, 1, 2, 3, 4, 5], > [ 6, 7, 8, 9, 10, 11]]) > > In [8]: a.take([0]) > Out[8]: array([[0, 1, 2, 3, 4, 5]]) > > In [9]: take(a,[0]) > Out[9]: array([0]) > Doh!. That is a bug. take(a,[0]) is correct a.take([0]) is not correct. -Travis |
From: Christian K. <ck...@ho...> - 2006-09-23 02:00:44
|
Travis Oliphant <oliphant.travis <at> ieee.org> writes: > > Christian Kristukat wrote: > > Bill Baxter <wbaxter <at> gmail.com> writes: > > > > > >> Yep, check the release notes: > >> http://www.scipy.org/ReleaseNotes/NumPy_1.0 > >> search for 'take' on that page to find out what others have changed as well. > >> --bb > >> > > > > Ok. Does axis=None then mean, that take(a, ind) operates on the > > flattened array? > > This it at least what it seem to be. I noticed that the ufunc behaves > > differently. a.take(ind) and a.take(ind, axis=0) behave the same, so > > the default > > argument to axis is 0 rather than None. > > > > What do you mean. There is no "ufunc" take. There is a function take > that just calls the method. The default arguments for all functions Sorry, I never really read about what are ufuncs. I thought those are class methods of the ndarray objects... Anyway, I was refering to the following difference: In [7]: a Out[7]: array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11]]) In [8]: a.take([0]) Out[8]: array([[0, 1, 2, 3, 4, 5]]) In [9]: take(a,[0]) Out[9]: array([0]) To be sure I understood: Does axis=None then mean, that take operates on the flattened array? Christian |
From: Sebastian H. <ha...@ms...> - 2006-09-22 21:50:06
|
On Friday 22 September 2006 12:57, Travis Oliphant wrote: > Bill Baxter wrote: > >26 weeks, 4 days, 2 hours and 9 minutes ago, Zden=ECk Hur=E1k asked why > >atleast_3d acts the way it does: > >http://article.gmane.org/gmane.comp.python.numeric.general/4382/match=3D= atle > >ast+3d > > > >He doesn't seem to have gotten any answers. And now I'm wondering the > >same thing. Anyone have any idea? > > This function came from scipy and was written by somebody at Enthought. > I was hoping they would respond. The behavior of atleast_3d does make > sense in the context of atleast_2d and thinking of 3-d arrays as > "stacks" of 2-d arrays where the stacks are in the last dimension. > > atleast_2d converts 1-d arrays to 1xN arrays > > atleast_3d converts 1-d arrays to 1xNx1 arrays so that they can be > "stacked" in the last dimension. I agree that this isn't consistent > with the general notion of "pre-pending" 1's to increase the > dimensionality of the array. > > However, array(a, copy=3DFalse, ndmin=3D3) will always produce arrays w= ith > a 1 at the begining. So at_least3d is convenient if you like to think > of 3-d arrays of stacks of 2-d arrays where the last axis is the > "stacking" dimension. > I'm working with "stacks of 2d arrays" -- but I was always under the=20 impression that -- since the last axis is the "fastest" -- "stacks of=20 <something>" should stack in the first dimension -- not the last --so that= =20 the members of the stack still remain contiguous is memory. Is there a general consensus on how one should look at this ? Or are there= =20 multiple incompatible view -- maybe coming from different fields -- ? =2DSebastian |
From: Robert K. <rob...@gm...> - 2006-09-22 21:21:23
|
Travis Oliphant wrote: > Bill Baxter wrote: > >> 26 weeks, 4 days, 2 hours and 9 minutes ago, Zdeněk Hurák asked why >> atleast_3d acts the way it does: >> http://article.gmane.org/gmane.comp.python.numeric.general/4382/match=atleast+3d >> >> He doesn't seem to have gotten any answers. And now I'm wondering the >> same thing. Anyone have any idea? >> > This function came from scipy and was written by somebody at Enthought. > I was hoping they would respond. That would have either been from Eric or Travis V way back in the day. Both are out of the office today. The "3D arrays as stacks of 2D arrays" is probably as good an explanation as any. Much of Numeric (e.g. the axis dot() reduces over) was predicated on that kind of logic. There is probably some kind of consistency argument with dstack() and atleast_2d(), i.e. In [46]: a = arange(5) In [47]: atleast_3d(a) Out[47]: array([[[0], [1], [2], [3], [4]]]) In [48]: dstack([a, a]) Out[48]: array([[[0, 0], [1, 1], [2, 2], [3, 3], [4, 4]]]) In [49]: dstack(map(atleast_2d, [a, a])) Out[49]: array([[[0, 0], [1, 1], [2, 2], [3, 3], [4, 4]]]) In [50]: dstack(map(atleast_3d, [a, a])) Out[50]: array([[[0, 0], [1, 1], [2, 2], [3, 3], [4, 4]]]) That's the problem with consistency arguments; there are so many things one could be consistent with! -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Travis O. <oli...@ee...> - 2006-09-22 19:57:12
|
Bill Baxter wrote: >26 weeks, 4 days, 2 hours and 9 minutes ago, Zdeněk Hurák asked why >atleast_3d acts the way it does: >http://article.gmane.org/gmane.comp.python.numeric.general/4382/match=atleast+3d > >He doesn't seem to have gotten any answers. And now I'm wondering the >same thing. Anyone have any idea? > > This function came from scipy and was written by somebody at Enthought. I was hoping they would respond. The behavior of atleast_3d does make sense in the context of atleast_2d and thinking of 3-d arrays as "stacks" of 2-d arrays where the stacks are in the last dimension. atleast_2d converts 1-d arrays to 1xN arrays atleast_3d converts 1-d arrays to 1xNx1 arrays so that they can be "stacked" in the last dimension. I agree that this isn't consistent with the general notion of "pre-pending" 1's to increase the dimensionality of the array. However, array(a, copy=False, ndmin=3) will always produce arrays with a 1 at the begining. So at_least3d is convenient if you like to think of 3-d arrays of stacks of 2-d arrays where the last axis is the "stacking" dimension. -Travis |
From: Robert K. <rob...@gm...> - 2006-09-22 19:54:44
|
Robert Kern wrote: > <Looks at svn blame and svn log> Ah, it's Travis's fault. So he can fix it. :-) And lo, it was fixed. Amen. http://projects.scipy.org/scipy/scipy/changeset/2217 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Mike R. <mik...@al...> - 2006-09-22 19:37:40
|
On 9/22/06, Bill Baxter <wb...@gm...> wrote: > > Do you also have a 64-bit processor? Just checking since you didn't > mention it. > --bb > > On 9/22/06, Sve...@ff... <Sve...@ff...> wrote: > > I would like to read files >2Gbyte. From earlier posting I believed it > > should be possible with python 2.5. > > > > I am getting MemoryError when trying to read a file larger than 2Gb. > > > > I am using python2.5 and numpy1.0rc1. > Also, how much memory do you have? You will have to use memory mapping as well if the file size is greater than your memory size. But you should be hopeful. I have been working with > 10 GB files for some time using beta versions of Python 2.5 and numpy-1.0 on a 64-bit AMD machine with 4 GB of memory. Mike -- mik...@al... |
From: Travis O. <oli...@ee...> - 2006-09-22 19:29:25
|
Travis Oliphant wrote: >Yes, this does explain what you are seeing. It is the behavior of >Numeric's putmask (where this method came from). It does seem >counter-intuitive, and I'm not sure what to do with it. In some sense >putmask should behave the same as x[m] = w. But, on the other-hand, >was anybody actually using the "modular" indexing "feature" of "putmask". > >Here are our options: > >1) "fix-it" and risk breaking code for people who used putmask and the >modular indexing "feature," >2) Get rid of it as a method (and keep it as a function so that >oldnumeric can use it.) >3) Keep everything the way it is. > > O.K. putmask is keeping the same behavior (it is intentional behavior and has been for a long time). However, because of the confusion, it is being removed as a method (I know this is late, but it's a little-used method and was only added by NumPy). Putmask will be a function. Also, the remaining put method will have it's arguments switched to match the function. IIRC the original switch was made to accomodate masked arrays methods of the same name. But, this does not really help anyway since arr.put for a masked array doesn't even take an indicies argument, so confusing users who don't even use masked arrays seems pointless. Scream now if you think I'm being unreasonable. This will be in 1.0rc2 (yes we will need rc2) The original poster has still not explained why a[mask] = values does not work suitably. -Travis |
From: Tim H. <tim...@ie...> - 2006-09-22 19:18:34
|
Charles R Harris wrote: [CHOP] > On 9/22/06, *Tim Hochberg* <tim...@ie... > <mailto:tim...@ie...>> wrote: > > > It sounds like there is some consensus to do something to improve the > precision after 1.0 comes out. I don't think the details are entirely > nailed down, but it sounds like some combination of using Kahan > summation and increasing the minimum precision of the accumulator is > likely for sum, mean, var and std. > > > I would go with double as the default except when the base precision > is greater. Higher precisions should be available as a check, i.e., if > the results look funny use Kahan or extended to see if roundoff is a > problem. I think Kahan should be available because it adds precision > to *any* base precision, even extended or quad, so there is always > something to check against. But I don't think Kahan should be the > default because of the speed hit. > Note that add.reduce will be available for doing simple-fast-reductions, so I consider some speed hit as acceptable. I'll be interested to see what the actual speed difference between Kahan and straight summation actually are. It may also be possible to unroll the core loop cleverly so that multiple ops can be scheduled on the FPU in parallel. It doesn't look like naive unrolling will work since each iteration depends on the values of the previous one. However, it probably doesn't matter much where the low order bits get added back in, so there's some flexibility in how the loop gets unrolled. -tim PS. Sorry about the untrimmed posts. |
From: Travis O. <oli...@ee...> - 2006-09-22 18:56:02
|
Brian Granger wrote: >>You can write a function that formats arrays how you like them and then tell >>ndarray to use it for __str__ or __repr__ using numpy.set_string_function(). >> >> > >That seems to be a little low level for most users. Would it be hard >to have the possibility of specifying a format string? > > No, that woudn't be hard. If you can wait several weeks, add a ticket, or better, dig in to the current printing function and add it to the set_printoptions code. -Travis |
From: PGM <pgm...@gm...> - 2006-09-22 18:51:19
|
Stefan, Thanks for your suggestions, but that won't do for what I'm working on : I need to get putmask working, or at least knowing it doesnt'. Robert, thanks for your input. The function putmask doesn't work either. Oh, thinking about it: would it be possible to have the same order of arguments in the function and the method ? Compare putmask(array, mask, values) with array.putmask(values, mask) The same comment applies to put: put(array, indices, values) vs array.put(values, indices, mode) That's a tad confusing... |
From: Brian G. <ell...@gm...> - 2006-09-22 18:44:17
|
> You can write a function that formats arrays how you like them and then tell > ndarray to use it for __str__ or __repr__ using numpy.set_string_function(). That seems to be a little low level for most users. Would it be hard to have the possibility of specifying a format string? Brian |
From: Robert K. <rob...@gm...> - 2006-09-22 18:35:33
|
Brian Granger wrote: > I want to be able to print an array in scientific notation. > > I have seen the set_printoptions() functions, but it doesn't really > have an option for *always* using scientific notation. > > Can this be done? How? You can write a function that formats arrays how you like them and then tell ndarray to use it for __str__ or __repr__ using numpy.set_string_function(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Robert K. <rob...@gm...> - 2006-09-22 18:28:37
|
Scott Ransom wrote: > argmin is currently defined as using the argmax method! Please check out the latest source from SVN. I fixed this a few days ago. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Charles R H. <cha...@gm...> - 2006-09-22 18:14:29
|
On 9/22/06, Tim Hochberg <tim...@ie...> wrote: > > Sebastian Haase wrote: > > On Thursday 21 September 2006 15:28, Tim Hochberg wrote: > > > >> David M. Cooke wrote: > >> > >>> On Thu, 21 Sep 2006 11:34:42 -0700 > >>> > >>> Tim Hochberg <tim...@ie...> wrote: > >>> > >>>> Tim Hochberg wrote: > >>>> > >>>>> Robert Kern wrote: > >>>>> > >>>>>> David M. Cooke wrote: > >>>>>> > >>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > >>>>>>> > >>>>>>>> Let me offer a third path: the algorithms used for .mean() and > >>>>>>>> .var() are substandard. There are much better incremental > algorithms > >>>>>>>> that entirely avoid the need to accumulate such large (and > therefore > >>>>>>>> precision-losing) intermediate values. The algorithms look like > the > >>>>>>>> following for 1D arrays in Python: > >>>>>>>> > >>>>>>>> def mean(a): > >>>>>>>> m = a[0] > >>>>>>>> for i in range(1, len(a)): > >>>>>>>> m += (a[i] - m) / (i + 1) > >>>>>>>> return m > >>>>>>>> > >>>>>>> This isn't really going to be any better than using a simple sum. > >>>>>>> It'll also be slower (a division per iteration). > >>>>>>> > >>>>>> With one exception, every test that I've thrown at it shows that > it's > >>>>>> better for float32. That exception is uniformly spaced arrays, like > >>>>>> linspace(). > >>>>>> > >>>>>> > You do avoid > >>>>>> > accumulating large sums, but then doing the division a[i]/len(a) > >>>>>> > and adding that will do the same. > >>>>>> > >>>>>> Okay, this is true. > >>>>>> > >>>>>> > >>>>>>> Now, if you want to avoid losing precision, you want to use a > better > >>>>>>> summation technique, like compensated (or Kahan) summation: > >>>>>>> > >>>>>>> def mean(a): > >>>>>>> s = e = a.dtype.type(0) > >>>>>>> for i in range(0, len(a)): > >>>>>>> temp = s > >>>>>>> y = a[i] + e > >>>>>>> s = temp + y > >>>>>>> e = (temp - s) + y > >>>>>>> return s / len(a) > >>>>>>> > >>>>>>> > >>>>>>>> def var(a): > >>>>>>>> m = a[0] > >>>>>>>> t = a.dtype.type(0) > >>>>>>>> for i in range(1, len(a)): > >>>>>>>> q = a[i] - m > >>>>>>>> r = q / (i+1) > >>>>>>>> m += r > >>>>>>>> t += i * q * r > >>>>>>>> t /= len(a) > >>>>>>>> return t > >>>>>>>> > >>>>>>>> Alternatively, from Knuth: > >>>>>>>> > >>>>>>>> def var_knuth(a): > >>>>>>>> m = a.dtype.type(0) > >>>>>>>> variance = a.dtype.type(0) > >>>>>>>> for i in range(len(a)): > >>>>>>>> delta = a[i] - m > >>>>>>>> m += delta / (i+1) > >>>>>>>> variance += delta * (a[i] - m) > >>>>>>>> variance /= len(a) > >>>>>>>> return variance > >>>>>>>> > >>>> I'm going to go ahead and attach a module containing the versions of > >>>> mean, var, etc that I've been playing with in case someone wants to > mess > >>>> with them. Some were stolen from traffic on this list, for others I > >>>> grabbed the algorithms from wikipedia or equivalent. > >>>> > >>> I looked into this a bit more. I checked float32 (single precision) > and > >>> float64 (double precision), using long doubles (float96) for the > "exact" > >>> results. This is based on your code. Results are compared using > >>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values > in > >>> the range of [-100, 900] > >>> > >>> First, the mean. In float32, the Kahan summation in single precision > is > >>> better by about 2 orders of magnitude than simple summation. However, > >>> accumulating the sum in double precision is better by about 9 orders > of > >>> magnitude than simple summation (7 orders more than Kahan). > >>> > >>> In float64, Kahan summation is the way to go, by 2 orders of > magnitude. > >>> > >>> For the variance, in float32, Knuth's method is *no better* than the > >>> two-pass method. Tim's code does an implicit conversion of > intermediate > >>> results to float64, which is why he saw a much better result. > >>> > >> Doh! And I fixed that same problem in the mean implementation earlier > >> too. I was astounded by how good knuth was doing, but not astounded > >> enough apparently. > >> > >> Does it seem weird to anyone else that in: > >> numpy_scalar <op> python_scalar > >> the precision ends up being controlled by the python scalar? I would > >> expect the numpy_scalar to control the resulting precision just like > >> numpy arrays do in similar circumstances. Perhaps the egg on my face is > >> just clouding my vision though. > >> > >> > >>> The two-pass method using > >>> Kahan summation (again, in single precision), is better by about 2 > orders > >>> of magnitude. There is practically no difference when using a > >>> double-precision accumulator amongst the techniques: they're all about > 9 > >>> orders of magnitude better than single-precision two-pass. > >>> > >>> In float64, Kahan summation is again better than the rest, by about 2 > >>> orders of magnitude. > >>> > >>> I've put my adaptation of Tim's code, and box-and-whisker plots of the > >>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ > >>> > >>> Conclusions: > >>> > >>> - If you're going to calculate everything in single precision, use > Kahan > >>> summation. Using it in double-precision also helps. > >>> - If you can use a double-precision accumulator, it's much better than > >>> any of the techniques in single-precision only. > >>> > >>> - for speed+precision in the variance, either use Kahan summation in > >>> single precision with the two-pass method, or use double precision > with > >>> simple summation with the two-pass method. Knuth buys you nothing, > except > >>> slower code :-) > >>> > >> The two pass methods are definitely more accurate. I won't be convinced > >> on the speed front till I see comparable C implementations slug it out. > >> That may well mean never in practice. However, I expect that somewhere > >> around 10,000 items, the cache will overflow and memory bandwidth will > >> become the bottleneck. At that point the extra operations of Knuth > won't > >> matter as much as making two passes through the array and Knuth will > win > >> on speed. Of course the accuracy is pretty bad at single precision, so > >> the possible, theoretical speed advantage at large sizes probably > >> doesn't matter. > >> > >> -tim > >> > >> > >>> After 1.0 is out, we should look at doing one of the above. > >>> > >> +1 > >> > >> > > Hi, > > I don't understand much of these "fancy algorithms", but does the above > mean > > that > > after 1.0 is released the float32 functions will catch up in terms of > > precision precision ("to some extend") with using dtype=float64 ? > > > It sounds like there is some consensus to do something to improve the > precision after 1.0 comes out. I don't think the details are entirely > nailed down, but it sounds like some combination of using Kahan > summation and increasing the minimum precision of the accumulator is > likely for sum, mean, var and std. I would go with double as the default except when the base precision is greater. Higher precisions should be available as a check, i.e., if the results look funny use Kahan or extended to see if roundoff is a problem. I think Kahan should be available because it adds precision to *any* base precision, even extended or quad, so there is always something to check against. But I don't think Kahan should be the default because of the speed hit. Chuck |
From: Brian G. <ell...@gm...> - 2006-09-22 17:41:35
|
I want to be able to print an array in scientific notation. I have seen the set_printoptions() functions, but it doesn't really have an option for *always* using scientific notation. Can this be done? How? Thanks Brian |
From: Scott R. <sr...@nr...> - 2006-09-22 17:37:17
|
argmin is currently defined as using the argmax method! =46rom numpy/oldnumeric/functions.py def argmin(x, axis=3D-1): return N.argmax(x, axis) Scott =2D-=20 Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sr...@nr... Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |
From: Travis O. <oli...@ie...> - 2006-09-22 17:23:37
|
Stefan van der Walt wrote: > On Fri, Sep 22, 2006 at 02:17:57AM -0500, Robert Kern wrote: > >>> According to the putmask docstring: >>> >>> a.putmask(values, mask) sets a.flat[n] = v[n] for each n where >>> mask.flat[n] is true. v can be scalar. >>> >>> This would mean that 'w' is not of the right length. >>> >> There are 4 true values in m and 4 values in w. What's the wrong >> > length? > > The way I read the docstring, you use putmask like this: > > In [4]: x = N.array([1,2,3,4]) > > In [5]: x.putmask([4,3,2,1],[1,0,0,1]) > > In [6]: x > Out[6]: array([4, 2, 3, 1]) > > >> >> Out[9] and Out[18] should have been the same, but elements 6 and 9 are flipped. >> It's pretty clear that this is a bug in .putmask(). >> > > Based purely on what I read in the docstring, I would expect the above to do > > x[0] = w[0] > x[6] = w[6] > x[9] = w[9] > x[11] = w[11] > > Since w is of length 4, you'll probably get indices modulo 4: > > w[6] == w[2] == -3 > w[9] == w[1] == -2 > w[11] == w[3] == -4 > > Which seems to explain what you are seeing. > Yes, this does explain what you are seeing. It is the behavior of Numeric's putmask (where this method came from). It does seem counter-intuitive, and I'm not sure what to do with it. In some sense putmask should behave the same as x[m] = w. But, on the other-hand, was anybody actually using the "modular" indexing "feature" of "putmask". Here are our options: 1) "fix-it" and risk breaking code for people who used putmask and the modular indexing "feature," 2) Get rid of it as a method (and keep it as a function so that oldnumeric can use it.) 3) Keep everything the way it is. -Travis |
From: Travis O. <oli...@ie...> - 2006-09-22 17:13:25
|
Christian Kristukat wrote: > Bill Baxter <wbaxter <at> gmail.com> writes: > > >> Yep, check the release notes: >> http://www.scipy.org/ReleaseNotes/NumPy_1.0 >> search for 'take' on that page to find out what others have changed as well. >> --bb >> > > Ok. Does axis=None then mean, that take(a, ind) operates on the flattened array? > This it at least what it seem to be. I noticed that the ufunc behaves > differently. a.take(ind) and a.take(ind, axis=0) behave the same, so the default > argument to axis is 0 rather than None. > What do you mean. There is no "ufunc" take. There is a function take that just calls the method. The default arguments for all functions that match methods are the same as the methods (which means axis=None). However, in oldnumeric (which pylab imports by the way), the default axes are the same as they were in Numeric. Also, if you have a 1-d array, then the axis argument doesn't make any difference. Please clarify what you are saying to be sure we don't have a bug floating around. -Travis |
From: Christopher B. <Chr...@no...> - 2006-09-22 16:34:51
|
Tim Hochberg wrote: > It would probably be nice to expose the > Kahan sum and maybe even the raw_kahan_sum somewhere. What about using it for .sum() by default? What is the speed hit anyway? In any case, having it available would be nice. > I'm on the fence on using the array dtype for the accumulator dtype > versus always using at least double precision for the accumulator. The > former is easier to explain and is probably faster, but the latter is a > lot more accuracy for basically free. I don't think the difficulty of explanation is a big issue at all -- I'm having a really hard time imagining someone getting confused and/or disappointed that their single precision calculation didn't overflow or was more accurate than expected. Anyone that did, would know enough to understand the explanation. In general, users expect things to "just work". They only dig into the details when something goes wrong, like the mean of a bunch of positive integers coming out as negative (wasn't that the OP's example?). The fewer such instance we have, the fewer questions we have. > speeds shake out I suppose. If the speed of using float64 is comparable > to that of using float32, we might as well. Only testing will tell, but my experience is that with double precision FPUs, doubles are just as fast as floats unless you're dealing with enough memory to make a difference in caching. In this case, only the accumulator is double, so that wouldn't be an issue. I suppose the float to double conversion could take some time though... > One thing I'm not on the > fence about is the return type: it should always match the input type, > or dtype if that is specified. +1 > Since numpy-scalars are > generally the results of indexing operations, not literals, I think that > they should behave like arrays for purposes of determining the resulting > precision, not like python-scalars. +1 >> Of course the accuracy is pretty bad at single precision, so >> the possible, theoretical speed advantage at large sizes probably >> doesn't matter. good point. the larger the array -- the more accuracy matters. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Tim H. <tim...@ie...> - 2006-09-22 16:06:36
|
Sebastian Haase wrote: > On Thursday 21 September 2006 15:28, Tim Hochberg wrote: > >> David M. Cooke wrote: >> >>> On Thu, 21 Sep 2006 11:34:42 -0700 >>> >>> Tim Hochberg <tim...@ie...> wrote: >>> >>>> Tim Hochberg wrote: >>>> >>>>> Robert Kern wrote: >>>>> >>>>>> David M. Cooke wrote: >>>>>> >>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: >>>>>>> >>>>>>>> Let me offer a third path: the algorithms used for .mean() and >>>>>>>> .var() are substandard. There are much better incremental algorithms >>>>>>>> that entirely avoid the need to accumulate such large (and therefore >>>>>>>> precision-losing) intermediate values. The algorithms look like the >>>>>>>> following for 1D arrays in Python: >>>>>>>> >>>>>>>> def mean(a): >>>>>>>> m = a[0] >>>>>>>> for i in range(1, len(a)): >>>>>>>> m += (a[i] - m) / (i + 1) >>>>>>>> return m >>>>>>>> >>>>>>> This isn't really going to be any better than using a simple sum. >>>>>>> It'll also be slower (a division per iteration). >>>>>>> >>>>>> With one exception, every test that I've thrown at it shows that it's >>>>>> better for float32. That exception is uniformly spaced arrays, like >>>>>> linspace(). >>>>>> >>>>>> > You do avoid >>>>>> > accumulating large sums, but then doing the division a[i]/len(a) >>>>>> > and adding that will do the same. >>>>>> >>>>>> Okay, this is true. >>>>>> >>>>>> >>>>>>> Now, if you want to avoid losing precision, you want to use a better >>>>>>> summation technique, like compensated (or Kahan) summation: >>>>>>> >>>>>>> def mean(a): >>>>>>> s = e = a.dtype.type(0) >>>>>>> for i in range(0, len(a)): >>>>>>> temp = s >>>>>>> y = a[i] + e >>>>>>> s = temp + y >>>>>>> e = (temp - s) + y >>>>>>> return s / len(a) >>>>>>> >>>>>>> >>>>>>>> def var(a): >>>>>>>> m = a[0] >>>>>>>> t = a.dtype.type(0) >>>>>>>> for i in range(1, len(a)): >>>>>>>> q = a[i] - m >>>>>>>> r = q / (i+1) >>>>>>>> m += r >>>>>>>> t += i * q * r >>>>>>>> t /= len(a) >>>>>>>> return t >>>>>>>> >>>>>>>> Alternatively, from Knuth: >>>>>>>> >>>>>>>> def var_knuth(a): >>>>>>>> m = a.dtype.type(0) >>>>>>>> variance = a.dtype.type(0) >>>>>>>> for i in range(len(a)): >>>>>>>> delta = a[i] - m >>>>>>>> m += delta / (i+1) >>>>>>>> variance += delta * (a[i] - m) >>>>>>>> variance /= len(a) >>>>>>>> return variance >>>>>>>> >>>> I'm going to go ahead and attach a module containing the versions of >>>> mean, var, etc that I've been playing with in case someone wants to mess >>>> with them. Some were stolen from traffic on this list, for others I >>>> grabbed the algorithms from wikipedia or equivalent. >>>> >>> I looked into this a bit more. I checked float32 (single precision) and >>> float64 (double precision), using long doubles (float96) for the "exact" >>> results. This is based on your code. Results are compared using >>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in >>> the range of [-100, 900] >>> >>> First, the mean. In float32, the Kahan summation in single precision is >>> better by about 2 orders of magnitude than simple summation. However, >>> accumulating the sum in double precision is better by about 9 orders of >>> magnitude than simple summation (7 orders more than Kahan). >>> >>> In float64, Kahan summation is the way to go, by 2 orders of magnitude. >>> >>> For the variance, in float32, Knuth's method is *no better* than the >>> two-pass method. Tim's code does an implicit conversion of intermediate >>> results to float64, which is why he saw a much better result. >>> >> Doh! And I fixed that same problem in the mean implementation earlier >> too. I was astounded by how good knuth was doing, but not astounded >> enough apparently. >> >> Does it seem weird to anyone else that in: >> numpy_scalar <op> python_scalar >> the precision ends up being controlled by the python scalar? I would >> expect the numpy_scalar to control the resulting precision just like >> numpy arrays do in similar circumstances. Perhaps the egg on my face is >> just clouding my vision though. >> >> >>> The two-pass method using >>> Kahan summation (again, in single precision), is better by about 2 orders >>> of magnitude. There is practically no difference when using a >>> double-precision accumulator amongst the techniques: they're all about 9 >>> orders of magnitude better than single-precision two-pass. >>> >>> In float64, Kahan summation is again better than the rest, by about 2 >>> orders of magnitude. >>> >>> I've put my adaptation of Tim's code, and box-and-whisker plots of the >>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ >>> >>> Conclusions: >>> >>> - If you're going to calculate everything in single precision, use Kahan >>> summation. Using it in double-precision also helps. >>> - If you can use a double-precision accumulator, it's much better than >>> any of the techniques in single-precision only. >>> >>> - for speed+precision in the variance, either use Kahan summation in >>> single precision with the two-pass method, or use double precision with >>> simple summation with the two-pass method. Knuth buys you nothing, except >>> slower code :-) >>> >> The two pass methods are definitely more accurate. I won't be convinced >> on the speed front till I see comparable C implementations slug it out. >> That may well mean never in practice. However, I expect that somewhere >> around 10,000 items, the cache will overflow and memory bandwidth will >> become the bottleneck. At that point the extra operations of Knuth won't >> matter as much as making two passes through the array and Knuth will win >> on speed. Of course the accuracy is pretty bad at single precision, so >> the possible, theoretical speed advantage at large sizes probably >> doesn't matter. >> >> -tim >> >> >>> After 1.0 is out, we should look at doing one of the above. >>> >> +1 >> >> > Hi, > I don't understand much of these "fancy algorithms", but does the above mean > that > after 1.0 is released the float32 functions will catch up in terms of > precision precision ("to some extend") with using dtype=float64 ? > It sounds like there is some consensus to do something to improve the precision after 1.0 comes out. I don't think the details are entirely nailed down, but it sounds like some combination of using Kahan summation and increasing the minimum precision of the accumulator is likely for sum, mean, var and std. -tim |
From: Charles R H. <cha...@gm...> - 2006-09-22 14:43:59
|
Hi, On 9/22/06, Tim Hochberg <tim...@ie...> wrote: > > Tim Hochberg wrote: > > David M. Cooke wrote: > > > >> On Thu, 21 Sep 2006 11:34:42 -0700 > >> Tim Hochberg <tim...@ie...> wrote: > >> > >> > >> > >>> Tim Hochberg wrote: > >>> > >>> > >>>> Robert Kern wrote: > >>>> > >>>> > >>>> > >>>>> David M. Cooke wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Let me offer a third path: the algorithms used for .mean() and > .var() > >>>>>>> are substandard. There are much better incremental algorithms that > >>>>>>> entirely avoid the need to accumulate such large (and therefore > >>>>>>> precision-losing) intermediate values. The algorithms look like > the > >>>>>>> following for 1D arrays in Python: > >>>>>>> > >>>>>>> def mean(a): > >>>>>>> m = a[0] > >>>>>>> for i in range(1, len(a)): > >>>>>>> m += (a[i] - m) / (i + 1) > >>>>>>> return m > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> This isn't really going to be any better than using a simple sum. > >>>>>> It'll also be slower (a division per iteration). > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> With one exception, every test that I've thrown at it shows that > it's > >>>>> better for float32. That exception is uniformly spaced arrays, like > >>>>> linspace(). > >>>>> > >>>>> > You do avoid > >>>>> > accumulating large sums, but then doing the division a[i]/len(a) > and > >>>>> > adding that will do the same. > >>>>> > >>>>> Okay, this is true. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Now, if you want to avoid losing precision, you want to use a > better > >>>>>> summation technique, like compensated (or Kahan) summation: > >>>>>> > >>>>>> def mean(a): > >>>>>> s = e = a.dtype.type(0) > >>>>>> for i in range(0, len(a)): > >>>>>> temp = s > >>>>>> y = a[i] + e > >>>>>> s = temp + y > >>>>>> e = (temp - s) + y > >>>>>> return s / len(a) > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>>>> def var(a): > >>>>>>> m = a[0] > >>>>>>> t = a.dtype.type(0) > >>>>>>> for i in range(1, len(a)): > >>>>>>> q = a[i] - m > >>>>>>> r = q / (i+1) > >>>>>>> m += r > >>>>>>> t += i * q * r > >>>>>>> t /= len(a) > >>>>>>> return t > >>>>>>> > >>>>>>> Alternatively, from Knuth: > >>>>>>> > >>>>>>> def var_knuth(a): > >>>>>>> m = a.dtype.type(0) > >>>>>>> variance = a.dtype.type(0) > >>>>>>> for i in range(len(a)): > >>>>>>> delta = a[i] - m > >>>>>>> m += delta / (i+1) > >>>>>>> variance += delta * (a[i] - m) > >>>>>>> variance /= len(a) > >>>>>>> return variance > >>>>>>> > >>>>>>> > >>> I'm going to go ahead and attach a module containing the versions of > >>> mean, var, etc that I've been playing with in case someone wants to > mess > >>> with them. Some were stolen from traffic on this list, for others I > >>> grabbed the algorithms from wikipedia or equivalent. > >>> > >>> > >> I looked into this a bit more. I checked float32 (single precision) and > >> float64 (double precision), using long doubles (float96) for the > "exact" > >> results. This is based on your code. Results are compared using > >> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values > in the > >> range of [-100, 900] > >> > >> First, the mean. In float32, the Kahan summation in single precision is > >> better by about 2 orders of magnitude than simple summation. However, > >> accumulating the sum in double precision is better by about 9 orders of > >> magnitude than simple summation (7 orders more than Kahan). > >> > >> In float64, Kahan summation is the way to go, by 2 orders of magnitude. > >> > >> For the variance, in float32, Knuth's method is *no better* than the > two-pass > >> method. Tim's code does an implicit conversion of intermediate results > to > >> float64, which is why he saw a much better result. > >> > > Doh! And I fixed that same problem in the mean implementation earlier > > too. I was astounded by how good knuth was doing, but not astounded > > enough apparently. > > > > Does it seem weird to anyone else that in: > > numpy_scalar <op> python_scalar > > the precision ends up being controlled by the python scalar? I would > > expect the numpy_scalar to control the resulting precision just like > > numpy arrays do in similar circumstances. Perhaps the egg on my face is > > just clouding my vision though. > > > > > >> The two-pass method using > >> Kahan summation (again, in single precision), is better by about 2 > orders of > >> magnitude. There is practically no difference when using a > double-precision > >> accumulator amongst the techniques: they're all about 9 orders of > magnitude > >> better than single-precision two-pass. > >> > >> In float64, Kahan summation is again better than the rest, by about 2 > orders > >> of magnitude. > >> > >> I've put my adaptation of Tim's code, and box-and-whisker plots of the > >> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ > >> > >> Conclusions: > >> > >> - If you're going to calculate everything in single precision, use > Kahan > >> summation. Using it in double-precision also helps. > >> - If you can use a double-precision accumulator, it's much better than > any of > >> the techniques in single-precision only. > >> > >> - for speed+precision in the variance, either use Kahan summation in > single > >> precision with the two-pass method, or use double precision with simple > >> summation with the two-pass method. Knuth buys you nothing, except > slower > >> code :-) > >> > >> > > The two pass methods are definitely more accurate. I won't be convinced > > on the speed front till I see comparable C implementations slug it out. > > That may well mean never in practice. However, I expect that somewhere > > around 10,000 items, the cache will overflow and memory bandwidth will > > become the bottleneck. At that point the extra operations of Knuth won't > > matter as much as making two passes through the array and Knuth will win > > on speed. > OK, let me take this back. I looked at this speed effect a little more > and the effect is smaller than I remember. For example, "a+=a" slows > down by about a factor of 2.5 on my box between 10,000 and 100,0000 > elements. That's not insignificant, but assuming (naively) that this > means that memory effects account to the equivalent of 1.5 add > operations, this doesn't come close to closing to gap between Knuth and > the two pass approach. The other thing is that while a+=a and similar > operations show this behaviour, a.sum() and add.reduce(a) do not, > hinting that perhaps it's only writing to memory that is a bottleneck. > Or perhaps hinting that my mental model of what's going on here is badly > flawed. > > So, +1 for me on Kahan summation for computing means and two pass with > Kahan summation for variances. It would probably be nice to expose the > Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for > the latter in summing a bunch of disparate matrices with high precision. > > I'm on the fence on using the array dtype for the accumulator dtype > versus always using at least double precision for the accumulator. I keep going back and forth on this myself. So I ask myself what I do in practice. In practice, I accumulate in doubles, sometimes extended precision. I only think of accumulating in floats when the number of items is small and computation might be noticeably faster -- and these days it seldom is. The days of the Vax and a 2x speed difference are gone, now we generally have FPUs whose base type is double with float tacked on as an afterthought. The only lesson from those days that remains pertinent is: never, ever, do division. So in practice, I would go with doubles. The only good reason to use floats for floats is a sort of pure consistency. Chuck |
From: Bill B. <wb...@gm...> - 2006-09-22 14:41:37
|
On 9/22/06, Alan G Isaac <ai...@am...> wrote: > On Fri, 22 Sep 2006, Bill Baxter apparently wrote: > > Ok, here's my best shot at a generalized repmat: > > Sorry to always repeat this suggestion when > it comes to repmat, but I think the whole approach > is wrong. A repmat should be a separate object type, > which behaves like the described matrix but has only one > copy of the repetitive data. That may be true for some cases. But I usually start modifying the data I create right after a repmat. It wouldn't help in that case. So unless you're really making a lot of large repmats of arrays that never change, or for use as temp variables, I can't see a separate class being that much of a win, compared with the complexity of implementing and maintaining it (think "fancy indexing"). YMMV. However, repmat seems to be far less commonly needed in numpy than in Matlab. I think that's mostly thanks to the broadcasting rules, which already create a sort of implicit repmat of the input in many common cases. --bb |