You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Travis O. <oli...@ie...> - 2006-08-27 06:45:39
|
Matt Knox wrote: > > Hi there. I'm in the unfortunate situation of trying to track down a > memory error in someone elses code, and to make matters worse I don't > really know jack squat about C programming. The problem seems to arise > when several numpy arrays are created from C arrays in the C api and > returned to python, and then trying to print out or cast to a string > the resulting array. I think the problem may be happening due to the > following chunk of code: > { > PyObject* temp = PyArray_SimpleNewFromData(1, &numobjs, typeNum, > dbValues); > PyObject* temp2 = PyArray_FromArray((PyArrayObject*)temp, > ((PyArrayObject*)temp)->descr, DEFAULT_FLAGS | ENSURECOPY); > Py_DECREF(temp); > PyDict_SetItemString(returnVal, "data", temp2); > Py_DECREF(temp2); > } > > Lets assume that all my other inputs up this point are fine and that > numobjs, typeNum, and dbValues are fine. Is their anything obviously > wrong with the above chunk of code? or does it appear ok? Ultimately > the dictionary "returnVal" is returned by the function this code came > from, and everything else is discarded. Any help is very greatly > appreciated. Thanks in advance, You didn't indicate what kind of trouble you are having. First of all, this is kind of odd style. Why is a new array created from a data-pointer and then copied using PyArray_FromArray (the ENSURECOPY flag will give you a copy)? Using temp2 = PyArray_Copy(temp) seems simpler. This will also avoid the reference-count problem that is currently happening in the PyArray_FromArray call on the descr structure. Any array-creation function that takes a descr structure "steals" a reference to it, so you need to increment the reference count if you are passing an unowned reference to a ->descr structure. -Travis |
From: Travis O. <oli...@ie...> - 2006-08-27 06:37:15
|
Les Schaffer wrote: > Travis E. Oliphant wrote: > >> Porting is not difficult especially using the compatibility layers >> numpy.oldnumeric and numpy.numarray and the alter_code1.py modules in >> those packages. The full C-API of Numeric is supported as is the C-API >> of Numarray. >> >> > > this is not true of numpy.core.records (nee numarray.records): > > 1. numarray's records.py does not show up in numpy.numarray. > Your right. It's an oversight that needs to be corrected. NumPy has a very capable records facility and the great people at STSCI have been very helpful in pointing out issues to help make it work reasonably like the numarray version. In addition, the records.py module started as a direct grab of the numarray code-base, so I think I may have mistakenly believed it was equivalent. But, it really should also be in the numarray compatibility module. The same is true of the chararrays defined in numpy with respect to the numarray.strings module. > 2. my code that uses recarrays is now broken if i use > numpy.core.records; for one thing, you have no .info attribute. All the attributes are not supported. The purpose of numpy.numarray.alter_code1 is to fix those attributes for you to numpy equivalents. In the case of info, for example, there is the function numpy.numarray.info(self) instead of self.info(). > another > example: strings pushed into the arrays *apparently* were stripped > automagically in the old recarray (so we coded appropriately), but now > are not. > We could try and address this in the compatibility module (there is the raw ability available to deal with this exactly as numarray did). Someone with more experience with numarray would really be able to help here as I'm not as aware of these kinds of issues, until they are pointed out. > 3. near zero docstrings for this module, hard to see how the new > records works. > The records.py code has a lot of code taken and adapted from numarray nearly directly. The docstrings present there were also copied over, but nothing more was added. There is plenty of work to do on the docstrings in general. This is an area, that even newcomers can contribute to greatly. Contributions are greatly welcome. > 4. last year i made a case for the old records to return a list of the > column names. I prefer the word "field" names now so as to avoid over-use of the word "column", but one thing to understand about the record array is that it is a pretty "simple" sub-class. And the basic ndarray, by itself contains the essential functionality of record arrays. The whole purpose of the record sub-class is to come up with an interface that is "easier-to use," (right now that just means allowing attribute access to the field names). Many may find that using the ndarray directly may be just what they are wanting and don't need the attribute-access allowed by the record-array sub-class. > it looks like the column names are now attributes of the > record object, any chance of getting a list of them > recarrayObj.get_colNames() or some such? Right now, the column names are properties of the data-type object associated with the array, so that recarrayObj.dtype.names will give you a list The data-type object also has other properties which are useful. Thanks for your review. We really need the help of as many numarray people as possible to make sure that the transition for them is easier. I've tried very hard to make sure that the numarray users have the tools they need to make the transition easier, but I know that more could be done. Unfortunately, my availability to help with this is rapidly waning, however, as I have to move focus back to my teaching and research. -Travis -Travis |
From: Charles R H. <cha...@gm...> - 2006-08-27 02:03:50
|
Hi, On 8/18/06, Sebastian Haase <ha...@ms...> wrote: <snip> Thanks, that seems to be a handy "dictionary-like object" > > Just for the record - in the meantime I found this: > >>> N.dtype(N.int32).itemsize > 4 And on x86_64 linux python ints are 8 bytes. In [15]: asarray([1])[0].itemsize Out[15]: 8 Interesting. Looks like one needs to be careful about the builtin python types. Chuck |
From: Matt K. <mat...@ho...> - 2006-08-26 22:07:37
|
Hi there. I'm in the unfortunate situation of trying to track down a memory= error in someone elses code, and to make matters worse I don't really know= jack squat about C programming. The problem seems to arise when several nu= mpy arrays are created from C arrays in the C api and returned to python, a= nd then trying to print out or cast to a string the resulting array. I thin= k the problem may be happening due to the following chunk of code: { PyObject* temp =3D PyArray_SimpleNewFromData(1, &numobjs, typeNum, dbV= alues); PyObject* temp2 =3D PyArray_FromArray((PyArrayObject*)temp, ((P= yArrayObject*)temp)->descr, DEFAULT_FLAGS | ENSURECOPY); Py_DECREF(temp= ); PyDict_SetItemString(returnVal, "data", temp2); Py_DECREF(temp2)= ; } =20 Lets assume that all my other inputs up this point are fine and that numobj= s, typeNum, and dbValues are fine. Is their anything obviously wrong with t= he above chunk of code? or does it appear ok? Ultimately the dictionary "re= turnVal" is returned by the function this code came from, and everything el= se is discarded. Any help is very greatly appreciated. Thanks in advance, =20 - Matt Knox =20 _________________________________________________________________ Be one of the first to try Windows Live Mail. http://ideas.live.com/programpage.aspx?versionId=3D5d21c51a-b161-4314-9b0e-= 4911fb2b2e6d= |
From: Robert K. <rob...@gm...> - 2006-08-26 21:38:46
|
Alan G Isaac wrote: > Did Albert's initiative get any traction? > http://www.mail-archive.com/num...@li.../msg01616.html > If so, Les might profit from coordinating with him. Not so much. Not many people showed up to the sprints, and most of those that did were working on their slides for their talks at the actual conference. Next year, sprints will come *after* the talks. > Is the preferred approach, as Albert suggested, > to submit documentation patches attached to tickets? Yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Francesc A. <fa...@ca...> - 2006-08-26 21:22:42
|
A Dissabte 26 Agost 2006 12:26, Travis Oliphant va escriure: > If frameis is 1-D, then you should be able to use > > temp =3D data.take(frameis,axis=3D0) > > for the first step. This can be quite a bit faster (and is a big > reason why take is still around). There are several reasons for this > (one of which is that index checking is done over the entire list when > using indexing). Well, some days ago I've stumbled on this as well. NumPy manual says=20 that .take() is usually faster than fancy indexing, but my timings shows th= at=20 this is no longer true in recent versions of NumPy: In [56]: Timer("b.take(a)","import numpy; a=3Dnumpy.arange(999,-1,-1,=20 dtype=3D'l');b=3Da[:]").repeat(3,1000) Out[56]: [0.28740906715393066, 0.20345211029052734, 0.20371079444885254] In [57]: Timer("b[a]","import numpy; a=3Dnumpy.arange(999,-1,-1,=20 dtype=3D'l');b=3Da[:]").repeat(3,1000) Out[57]: [0.20807695388793945, 0.11684703826904297, 0.11686491966247559] I've done some profiling on this and it seems that take is using C memmove= =20 call so as to copy the data, and this is *very* slow, at least in my platfo= rm=20 (Linux on Intel). On its hand, fancy indexing seems to use an iterator and= =20 copying the elements one-by-one seems faster. I'd say that replacing memmov= e=20 by memcpy would make .take() much faster. Regards, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
From: Francesc A. <fa...@ca...> - 2006-08-26 21:00:59
|
A Dissabte 26 Agost 2006 13:42, Bill Baxter va escriure: > On 8/26/06, Francesc Altet <fa...@ca...> wrote: > > I'm personally an addict to encapsulate as much functionality as possib= le > > in > > methods (but perhaps I'm biased by an insane use of TAB in ipython > > console). > > You can still get tab completion for functions: numpy.<TAB> > Even if it's your custom to "from numpy import *" you can still also do an > "import numpy" or "import numpy as N". Yep, you are right. It is just that I tend to do that on the objects that I= =20 manipulate and not with first-level functions in packages. Anyway, I think that I see now that these routines should not be methods=20 because they modify the *actual* data on ndarrays. Sorry for the disgression, =2D-=20 >0,0< Francesc Altet =C2=A0 =C2=A0 http://www.carabos.com/ V V C=C3=A1rabos Coop. V. =C2=A0=C2=A0Enjoy Data "-" |
From: Alan G I. <ai...@am...> - 2006-08-26 20:59:50
|
> Les Schaffer wrote: >> i'll pitch in some >> time to add docstrings, if i know they will be used. On Sat, 26 Aug 2006, Robert Kern apparently wrote: > Of course they will. Did Albert's initiative get any traction? http://www.mail-archive.com/num...@li.../msg01616.html If so, Les might profit from coordinating with him. Is the preferred approach, as Albert suggested, to submit documentation patches attached to tickets? Cheers, Alan Isaac |
From: Robert K. <rob...@gm...> - 2006-08-26 20:38:05
|
Les Schaffer wrote: > i'll pitch in some > time to add docstrings, if i know they will be used. Of course they will. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Les S. <sch...@op...> - 2006-08-26 20:27:43
|
Alan G Isaac wrote: > Of course I bothered to write because I read this list and > appreciate in addition to its helpfulness that it generally > maintains a more polite tone. This too has value. > > > so, you want to work on improving the documentation of this poorly documented module? then lets get down to details. i'll pitch in some time to add docstrings, if i know they will be used. les |
From: Alan G I. <ai...@am...> - 2006-08-26 20:23:03
|
On Sat, 26 Aug 2006, Les Schaffer apparently wrote:=20 > save the moral speech=20 I did not say anything about morals. I spoke only of *advantages* of politeness, which someone age 52 might still need to ponder. Of course I bothered to write because I read this list and=20 appreciate in addition to its helpfulness that it generally=20 maintains a more polite tone. This too has value. Cheers, Alan Isaac=20 |
From: Les S. <sch...@op...> - 2006-08-26 20:07:16
|
Alan G Isaac wrote: > I am always mystified when someone requesting free help > adopts a pissy tone if they do not immediately > get what they wish. > > It reminds me a bit of my youngest child, age 7, > whom I am still teaching the advantages of politeness. > you are refering to robert kern i take it???? because i am 52. and relax, i have given plenty of free help in my life, and constantly asked for it, pissy tones and all. so save the moral speech for your friends. les |
From: Alan G I. <ai...@am...> - 2006-08-26 20:03:15
|
On Sat, 26 Aug 2006, Les Schaffer apparently wrote:=20 > congratulations, this can be the first docstring in=20 > records. now what about the incompatibility between old=20 > and new=20 I am always mystified when someone requesting free help adopts a pissy tone if they do not immediately get what they wish. It reminds me a bit of my youngest child, age 7, whom I am still teaching the advantages of politeness. Cheers, Alan Isaac |
From: Les S. <sch...@op...> - 2006-08-26 19:50:23
|
Robert Kern wrote: > http://www.scipy.org/RecordArrays > which didn't help one iota. look, someone is charging for documentation, but the claim is the free docstrings have docs. for the records module, this ain't so. documentation means someone knows what is the complete public interface. yes, examples help. earlier, you said: > In [6]: a.dtype.names > Out[6]: ('float', 'int') congratulations, this can be the first docstring in records. now what about the incompatibility between old and new. les schaffer |
From: Robert K. <rob...@gm...> - 2006-08-26 19:30:08
|
Les Schaffer wrote: > 3. near zero docstrings for this module, hard to see how the new > records works. http://www.scipy.org/RecordArrays -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Robert K. <rob...@gm...> - 2006-08-26 19:28:39
|
Les Schaffer wrote: > 4. last year i made a case for the old records to return a list of the > column names. it looks like the column names are now attributes of the > record object, any chance of getting a list of them > recarrayObj.get_colNames() or some such? yes, in working code, we know > what the names are, but in test code we are creating recarrays from > parsing of Excel spreadsheets, and for testing purposes, its nice to > know what records THINKS are the names of all the columns. In [2]: from numpy import * In [3]: rec.fromarrays(ones(10, dtype=float) Display all 628 possibilities? (y or n) In [3]: a = rec.fromarrays([ones(10, dtype=float), ones(10, dtype=int)], names='float,int', formats=[float, int]) In [4]: a Out[4]: recarray([(1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1), (1.0, 1)], dtype=[('float', '>f8'), ('int', '>i4')]) In [6]: a.dtype.names Out[6]: ('float', 'int') -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Les S. <sch...@op...> - 2006-08-26 18:06:57
|
Travis E. Oliphant wrote: > Porting is not difficult especially using the compatibility layers > numpy.oldnumeric and numpy.numarray and the alter_code1.py modules in > those packages. The full C-API of Numeric is supported as is the C-API > of Numarray. > this is not true of numpy.core.records (nee numarray.records): 1. numarray's records.py does not show up in numpy.numarray. 2. my code that uses recarrays is now broken if i use numpy.core.records; for one thing, you have no .info attribute. another example: strings pushed into the arrays *apparently* were stripped automagically in the old recarray (so we coded appropriately), but now are not. 3. near zero docstrings for this module, hard to see how the new records works. 4. last year i made a case for the old records to return a list of the column names. it looks like the column names are now attributes of the record object, any chance of getting a list of them recarrayObj.get_colNames() or some such? yes, in working code, we know what the names are, but in test code we are creating recarrays from parsing of Excel spreadsheets, and for testing purposes, its nice to know what records THINKS are the names of all the columns. Les Schaffer |
From: Nick F. <nv...@MI...> - 2006-08-26 18:00:41
|
On Aug 26, 2006, at 7:05 AM, Keith Goodman wrote: > On 8/26/06, Bill Baxter <wb...@gm...> wrote: >> On 8/26/06, Travis Oliphant <oli...@ie...> wrote: >> >>> >>> I've come up with adding the functions (not methods at this point) >>> >>> deletefrom >>> insertinto >> >> >> "delete" and "insert" really would be better. The current "insert" >> function seems inaptly named. What it does sounds more like >> "overlay" or >> "set_masked". > > I prefer delete and insert too. I guess it is OK that del and delete > are similar (?) How about "deleted" and "inserted" to parallel "sorted"? "delete" and "insert" sound very imperative and side-effects-ish. Nick |
From: Tim H. <tim...@ie...> - 2006-08-26 18:00:11
|
Martin Spacek wrote: > Hello, > > I'm a bit ignorant of optimization in numpy. > > I have a movie with 65535 32x32 frames stored in a 3D array of uint8 > with shape (65535, 32, 32). I load it from an open file f like this: > > >>> import numpy as np > >>> data = np.fromfile(f, np.uint8, count=65535*32*32) > >>> data = data.reshape(65535, 32, 32) > > I'm picking several thousand frames more or less randomly from > throughout the movie and finding the mean frame over those frames: > > >>> meanframe = data[frameis].mean(axis=0) > > frameis is a 1D array of frame indices with no repeated values in it. If > it has say 4000 indices in it, then the above line takes about 0.5 sec > to complete on my system. I'm doing this for a large number of different > frameis, some of which can have many more indices in them. All this > takes many minutes to complete, so I'm looking for ways to speed it up. > > If I divide it into 2 steps: > > >>> temp = data[frameis] > >>> meanframe = temp.mean(axis=0) > > and time it, I find the first step takes about 0.2 sec, and the second > takes about 0.3 sec. So it's not just the mean() step, but also the > indexing step that's taking some time. > > If I flatten with ravel: > > >>> temp = data[frameis].ravel() > >>> meanframe = temp.mean(axis=0) > > then the first step still takes about 0.2 sec, but the mean() step drops > to about 0.1 sec. But of course, this is taking a flat average across > all pixels in the movie, which isn't what I want to do. > > I have a feeling that the culprit is the non contiguity of the movie > frames being averaged, but I don't know how to proceed. > > Any ideas? Could reshaping the data somehow speed things up? Would > weave.blitz or weave.inline or pyrex help? > > I'm running numpy 0.9.8 > > Thanks, > > Martin > Martin, Here's an approach (mean_accumulate) that avoids making any copies of the data. It runs almost 4x as fast as your approach (called baseline here) on my box. Perhaps this will be useful: frames = 65535 samples = 4000 data = (256 * np.random.random((frames, 32, 32))).astype(np.uint8) indices = np.arange(frames) random.shuffle(indices) indices = indices[:samples] def mean_baseline(data, indices): return data[indices].mean(axis=0) def mean_accumulate(data, indices): result = np.zeros([32, 32], float) for i in indices: result += data[i] result /= len(indices) return result if __name__ == "__main__": import timeit print mean_baseline(data, indices)[0,:8] print timeit.Timer("s.mean_baseline(s.data, s.indices)", "import scratch as s").timeit(10) print mean_accumulate(data, indices)[0,:8] print timeit.Timer("s.mean_accumulate(s.data, s.indices)", "import scratch as s").timeit(10) This gives: [ 126.947 127.39175 128.03725 129.83425 127.98925 126.866 128.5352 127.6205 ] 3.95907664242 [ 126.947 127.39175 128.03725 129.83425 127.98925 126.866 128.53525 127.6205 ] 1.06913644053 I also wondered if sorting indices would help since it would help improve locality of reference, but when I measured that it appeared to help not at all. regards, -tim |
From: Charles R H. <cha...@gm...> - 2006-08-26 17:49:35
|
On 8/26/06, Torgil Svensson <tor...@gm...> wrote: > > Hi > > ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I > first thought I had a performance issue but discovered that std() used > lots of memory and therefore caused lots of swapping. > > I want to get an array where element i is the stadard deviation of row > i in the 2D array. Using valgrind on the std() function... > > $ valgrind --tool=massif python -c "from numpy import *; > a=reshape(arange(100000*100),(100000,100)).std(axis=1)" > > ... showed me a peak of 200Mb memory while iterating line by line... > > $ valgrind --tool=massif python -c "from numpy import *; > a=array([x.std() for x in reshape(arange(100000*100),(100000,100))])" > > ... got a peak of 40Mb memory. > > This seems unnecessary since we know before calculations what the > output shape will be and should therefore be able to preallocate > memory. > > > My original problem was to get an moving average and a moving standard > deviation (120k rows and N=1000). For average I guess convolve should > perform good, but is there anything smart for std()? For now I use ... Why not use convolve for the std also? You can't subtract the average first, but you could convolve the square of the vector and then use some variant of std = sqrt((convsqrs - n*avg**2)/(n-1)). There are possible precision problems but they may not matter for your application, especially if the moving window isn't really big. Chuck |
From: Torgil S. <tor...@gm...> - 2006-08-26 17:02:56
|
Hi ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I first thought I had a performance issue but discovered that std() used lots of memory and therefore caused lots of swapping. I want to get an array where element i is the stadard deviation of row i in the 2D array. Using valgrind on the std() function... $ valgrind --tool=massif python -c "from numpy import *; a=reshape(arange(100000*100),(100000,100)).std(axis=1)" ... showed me a peak of 200Mb memory while iterating line by line... $ valgrind --tool=massif python -c "from numpy import *; a=array([x.std() for x in reshape(arange(100000*100),(100000,100))])" ... got a peak of 40Mb memory. This seems unnecessary since we know before calculations what the output shape will be and should therefore be able to preallocate memory. My original problem was to get an moving average and a moving standard deviation (120k rows and N=1000). For average I guess convolve should perform good, but is there anything smart for std()? For now I use ... >>> moving_std=array([a[i:i+n].std() for i in range(len(a)-n)]) which seems to perform quite well. BR, //Torgil |
From: Charles R H. <cha...@gm...> - 2006-08-26 16:35:17
|
Hi, On 8/26/06, Albert Strasheim <fu...@gm...> wrote: > > A complete code snippet that reproduces the bug would be most helpful. +1. I too suspect that what you have here is a reference/copy problem. The only thing that is local to the class is the reference (pointer), the data is global. Chuck |
From: Charles R H. <cha...@gm...> - 2006-08-26 16:30:01
|
Hi, On 8/26/06, Keith Goodman <kwg...@gm...> wrote: > > On 8/26/06, Bill Baxter <wb...@gm...> wrote: > > On 8/26/06, Travis Oliphant <oli...@ie...> wrote: > > > > > > > > I've come up with adding the functions (not methods at this point) > > > > > > deletefrom > > > insertinto > > > > > > "delete" and "insert" really would be better. The current "insert" > > function seems inaptly named. What it does sounds more like "overlay" > or > > "set_masked". > > I prefer delete and insert too. I guess it is OK that del and delete > are similar (?) Me too, although remove could be used instead of delete. Is there a problem besides compatibility with removing or changing the old insert? Chuck |
From: Charles R H. <cha...@gm...> - 2006-08-26 16:22:34
|
Hi, On 8/26/06, Bill Baxter <wb...@gm...> wrote: > > You're sure it's not just pass-by-reference semantics biting you? > If you make an array and pass it to another class or function, by default > they just get a reference to the same array. > so e.g.: > > a = numpy.array ([1,2,3]) > some_class.set_array(a) > a[1] = 10 > > Then both the local 'a' and the 'a' that some_class has are now [1,10,3]. > If you don't want that sharing then you need to make an explicit copy of a > by calling a.copy (). > Watch out for lists or dicts of arrays too. The python idom for copying > a list: new_list = list_orig[:], won't copy the contents of elements that > are array. If you want to be sure to make complete copies of complex data > structures, there's the deepcopy method of the copy module. new_list = > copy.deepcopy(list_orig). > > I found a bunch of these sorts of bugs in some code I ported over from > Matlab last week. Matlab uses copy semantics for everything, > Matlab does copy on write, so it maintains a reference until an element is modified, at which point it makes a copy. I believe it does this for efficiency and memory conservation, probably the latter because it doesn't seem to have garbage collection. I could be wrong about that, though. Chuck |
From: Charles R H. <cha...@gm...> - 2006-08-26 16:02:56
|
Hi, On 8/26/06, Sven Schreiber <sve...@gm...> wrote: > > Hi, > > is this normal behavior?: > > >>> import numpy as n; print n.mat(0.075).round(2); print > n.mat(0.575).round(2) > [[ 0.08]] > [[ 0.57]] In [7]: (arange(100)*.5).round() Out[7]: array([ 0., 0., 1., 2., 2., 2., 3., 4., 4., 4., 5., 6., 6., 6., 7., 8., 8., 8., 9., 10., 10., 10., 11., 12., 12., 12., 13., 14., 14., 14., 15., 16., 16., 16., 17., 18., 18., 18., 19., 20., 20., 20., 21., 22., 22., 22., 23., 24., 24., 24., 25., 26., 26., 26., 27., 28., 28., 28., 29., 30., 30., 30., 31., 32., 32., 32., 33., 34., 34., 34., 35., 36., 36., 36., 37., 38., 38., 38., 39., 40., 40., 40., 41., 42., 42., 42., 43., 44., 44., 44., 45., 46., 46., 46., 47., 48., 48., 48., 49., 50.]) It looks like numpy does round to even. Knuth has a discussion of rounding that is worth reading, although he prefers round to odd. The basic idea is to avoid the systematic bias that comes from always rounding in one direction. Another thing to bear in mind is that floating point isn't always what it seems due to the conversion between decimal and binary representation: In [8]: print '%25.18f'%.075 0.074999999999999997 Throw in multiplication, different precisions in the internal computations of the fpu, rounding in the print routine, and other complications, and it is tough to know precisely what should happen. For instance: In [15]: '%25.18f'%(mat(0.575)*100) Out[15]: ' 57.499999999999992895' In [16]: '%25.18f'%(around(mat(0.575)*100)) Out[16]: ' 57.000000000000000000' In [17]: '%25.18f'%(around(mat(0.575)*100)/100) Out[17]: ' 0.569999999999999951' And you can see that .575 after conversion to IEEE floating point and scaling was properly rounded down and showed up as .57 after the default print precision is taken into account. Python, on the other hand, always rounds up: In [12]: for i in range(10) : print '%25.18f'%round(i*.5) ....: 0.000000000000000000 1.000000000000000000 1.000000000000000000 2.000000000000000000 2.000000000000000000 3.000000000000000000 3.000000000000000000 4.000000000000000000 4.000000000000000000 5.000000000000000000 Chuck |