You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: eric j. <er...@en...> - 2002-06-11 22:00:09
|
> From: cbarker@localhost.localdomain [mailto:cbarker@localhost.localdomain] > > eric jones wrote: > > The default axis choice influences how people choose to lay out their > > data in arrays. If the default is to sum down columns, then users lay > > out their data so that this is the order of computation. > > This is absolutely true. I definitely choose my data layout to that the > various rank reducing operators do what I want. Another reason to have > consistency. So I don't really care which way is default, so the default > might as well be the better performing option. > > Of course, compatibility with previous versions is helpful too...arrrgg! > > What kind of a performance difference are we talking here anyway? Guess I ought to test instead of just saying it is so... I ran the following test of summing 200 sets of 10000 numbers. I expected a speed-up of about 2... I didn't get it. They are pretty much the same speed on my machine.?? (more later) C:\WINDOWS\system32>python ActivePython 2.2.1 Build 222 (ActiveState Corp.) based on Python 2.2.1 (#34, Apr 15 2002, 09:51:39) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Numeric import * >>> import time >>> a = ones((10000,200),Float) * arange(10000)[:,NewAxis] >>> b = ones((200,10000),Float) * arange(10000)[NewAxis,:] >>> t1 = time.clock();x=sum(a,axis=0);t2 = time.clock();print t2-t1 0.0772411018719 >>> t1 = time.clock();x=sum(b,axis=-1);t2 = time.clock();print t2-t1 0.079615705348 I also tried FFT, and did see a difference -- a speed up of 1.5+: >>> q = ones((1024,1024),Float) >>> t1 = time.clock();x = FFT.fft(q,axis=0);t2 = time.clock();print t2-t1 0.907373143793 >>> t1 = time.clock();x= FFT.fft(q,axis=-1);t2 = time.clock();print t2-t1 0.581641800843 >>> .907/.581 1.5611015490533564 Same in scipy >>> from scipy import * >>> a = ones((1024,1024),Float) >>> import time >>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1 0.870259488287 >>> t1 = time.clock(); q = fft(a,axis=-1); t2 = time.clock();print t2-t1 0.489512214541 >>> t1 = time.clock(); q = fft(a,axis=0); t2 = time.clock();print t2-t1 0.849266317367 >>> .849/.489 1.7361963190184049 So why is sum() the same speed for both cases? I don't know. I wrote a quick C program that is similar to how Numeric loops work, and I saw about a factor of 4 improvement by summing rows instead columns: C:\home\eric\wrk\axis_speed>gcc -O2 axis.c C:\home\eric\wrk\axis_speed>a summing rows (sec): 0.040000 summing columns (sec): 0.160000 pass These numbers are more like what I expected to see in the Numeric tests, but they are strange when compared to the Numeric timings -- the row sum is twice as fast as Numeric while the column sum is twice as slow. Because all the work is done in C and we're summing reasonably long arrays, the Numeric and C versions should be roughly the same speed. I can understand why summing rows is twice as fast in my C routine -- the Numeric loop code is not going to win awards for being optimal. What I don't understand is why the column summation is twice as slow in my C code as in Numeric. This should not be. I've posted it below in case someone can enlighten me. I think in general, you should see a speed up of 1.5+ when the summing over the "faster" axis. This holds true for fft in Python and my sum in C. As to why I don't in Numeric's sum(), I'm not sure. It is certainly true that non-strided access makes the best use of cache and *usually* is faster. eric ------------------------------------------------------------------------ -- #include <malloc.h> #include <time.h> int main() { double *a, *sum1, *sum2; int i, j, si, sj, ind, I, J; int small=200, big=10000; time_t t1, t2; I = small; J = big; si = big; sj = 1; a = (double*)malloc(I*J*sizeof(double)); sum1 = (double*)malloc(small*sizeof(double)); sum2 = (double*)malloc(small*sizeof(double)); //set memory for(i = 0; i < I; i++) { sum1[i] = 0; sum2[i] = 0; ind = si * i; for(j = 0; j < J; j++) { a[ind] = (double)j; ind += sj; } ind += si; } t1 = clock(); for(i = 0; i < I; i++) { sum1[i] = 0; ind = si * i; for(j = 0; j < J; j++) { sum1[i] += a[ind]; ind += sj; } ind += si; } t2 = clock(); printf("summing rows (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC); I = big; J = small; sj = big; si = 1; t1 = clock(); //set memory for(i = 0; i < I; i++) { ind = si * i; for(j = 0; j < J; j++) { a[ind] = (double)i; ind += sj; } ind += si; } for(j = 0; j < J; j++) { sum2[j] = 0; ind = sj * j; for(i = 0; i < I; i++) { sum2[j] += a[ind]; ind += si; } } t2 = clock(); printf("summing columns (sec): %f\n", (t2-t1)/(float)CLOCKS_PER_SEC); for (i=0; i < small; i++) { if(sum1[i] != sum2[i]) printf("failure %d, %f %f\n", i, sum1[i], sum2[i]); } printf("pass %f\n", sum1[0]); return 0; } |
From: Scott R. <ra...@ph...> - 2002-06-11 21:07:34
|
On June 11, 2002 04:56 pm, you wrote: > One can make a case for allowing == and != for complex arrays, but > > just doesn't make sense and should not be allowed. It depends if you think of complex numbers in phasor form or not. In phasor form, the amplitude of the complex number is certainly something that you could compare with > or < -- and in my opinion, that seems like a reasonable comparison. You _could_ do the same thing with the phases, except you run into the modulo 2pi thing... Scott > > -----Original Message----- > > From: num...@li... > > [mailto:num...@li...] On > > Behalf Of Perry Greenfield > > Sent: Tuesday, June 11, 2002 11:52 AM > > To: eric jones; 'Konrad Hinsen' > > Cc: num...@li... > > Subject: RE: [Numpy-discussion] RE: default axis for numarray > > > > > > <Eric Jones writes>: > > > > <Konrad Hinsen writes>: > > > > What needs to be improved in that area? > > > > > > Comparisons of complex numbers. But lets save that debate > > > > for later. > > > > > > No, no, let's do it now. ;-) We for one would like to know for > > numarray what should be done. > > > > If I might be presumptious enough to anticipate what Eric > > would say, it is that complex comparisons should be allowed, > > and that they use all the information in the complex number > > (real and imaginary) so that they lead to consistent results > > in sorting. > > > > But the purist argues that comparisons for complex numbers > > are meaningless. Well, yes, but there are cases in code where you > > don't which such comparisons to cause an exception. But even > > more important, there is at least one case which is > > practical. It isn't all that uncommon to want to eliminate > > duplicate values from arrays, and one would like to be able > > to do that for > > complex values as well. A common technique is to sort the > > values and then eliminate all identical adjacent values. A > > predictable comparison rule would allow that to be easily implemented. > > > > Eric, am I missing anything in this? It should be obvious > > that we agree with his position, but I am wondering if there > > are any arguments we have not heard yet that outweigh the > > advantages we see. > > > > Perry > > > > _______________________________________________________________ > > > > Multimillion Dollar Computer Inventory > > Live Webcast Auctions Thru Aug. 2002 - > > http://www.cowanalexander.com/calendar > > > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list Num...@li... > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ra...@ph... Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |
From: Paul F D. <pa...@pf...> - 2002-06-11 20:56:43
|
One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed. > -----Original Message----- > From: num...@li... > [mailto:num...@li...] On > Behalf Of Perry Greenfield > Sent: Tuesday, June 11, 2002 11:52 AM > To: eric jones; 'Konrad Hinsen' > Cc: num...@li... > Subject: RE: [Numpy-discussion] RE: default axis for numarray > > > <Eric Jones writes>: > <Konrad Hinsen writes>: > > > > What needs to be improved in that area? > > > > Comparisons of complex numbers. But lets save that debate > for later. > > > > No, no, let's do it now. ;-) We for one would like to know for > numarray what should be done. > > If I might be presumptious enough to anticipate what Eric > would say, it is that complex comparisons should be allowed, > and that they use all the information in the complex number > (real and imaginary) so that they lead to consistent results > in sorting. > > But the purist argues that comparisons for complex numbers > are meaningless. Well, yes, but there are cases in code where you > don't which such comparisons to cause an exception. But even > more important, there is at least one case which is > practical. It isn't all that uncommon to want to eliminate > duplicate values from arrays, and one would like to be able > to do that for > complex values as well. A common technique is to sort the > values and then eliminate all identical adjacent values. A > predictable comparison rule would allow that to be easily implemented. > > Eric, am I missing anything in this? It should be obvious > that we agree with his position, but I am wondering if there > are any arguments we have not heard yet that outweigh the > advantages we see. > > Perry > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > _______________________________________________ > Numpy-discussion mailing list Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Paul F D. <pa...@pf...> - 2002-06-11 20:54:02
|
MA users seem to all be happy with the facility in MA for limiting printing. >>> x=MA.arange(20) >>> x array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,]) >>> MA.set_print_limit(10) >>> x array([0,1,2,3,4,5,6,7,8,9,] + 10 more elements) >>> print x [0,1,2,3,4,5,6,7,8,9,] + 10 more elements >>> MA.set_print_limit(0) # no limit >>> x array([ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]) > -----Original Message----- > From: num...@li... > [mailto:num...@li...] On > Behalf Of Tim Hochberg > Sent: Tuesday, June 11, 2002 12:29 PM > To: num...@li... > Subject: Re: [Numpy-discussion] repr for numarray > > > > I would also be inclined toward option 3 with the caveat that > THRESHOLD=None should print all the values for the purists > out there (or if you want to use repr to dump the array to > some sort of flat file). > > -tim > > > > On June 11, 2002 02:43 pm, Perry Greenfield wrote: > > > > > Yet on the other hand, it is undeniably convenient to use > repr (by > > > typing a variable) for small arrays interactively rather > than using > > > a print statement. This leads to 3 possible proposals for > handling > > > repr: > > > > > > 1) Do what is done now, always print a string that when > eval'ed will > > > recreate the array. > > > > > > 2) Only give summary information for the array regardless of its > > > size. > > > > > > 3) Print the array if it has fewer than THRESHOLD number of > > > elements, otherwise print a summary. THRESHOLD may be adjusted by > > > the user. > > > > > > The last appears to be the most utilitarian to us, yet 'impure' > > > somehow. Certainly there are may objects for which > > > > I vote for number 3, and have no hang-ups about any real or > perceived > > "impurity". This is an issue that I deal with daily. > > > > Scott > > > > > > -- > > Scott M. Ransom Address: McGill Univ. Physics Dept. > > Phone: (514) 398-6492 3600 University St., Rm 338 > > email: ra...@ph... Montreal, QC Canada H3A 2T8 > > GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 > > > > _______________________________________________________________ > > > > Multimillion Dollar Computer Inventory > > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > > > > _______________________________________________ > > Numpy-discussion mailing list Num...@li... > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - > http://www.cowanalexander.com/calendar > > > > > _______________________________________________ > Numpy-discussion mailing list Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Tim H. <tim...@ie...> - 2002-06-11 19:29:19
|
I would also be inclined toward option 3 with the caveat that THRESHOLD=None should print all the values for the purists out there (or if you want to use repr to dump the array to some sort of flat file). -tim > On June 11, 2002 02:43 pm, Perry Greenfield wrote: > > > Yet on the other hand, it is undeniably convenient to use > > repr (by typing a variable) for small arrays interactively > > rather than using a print statement. This leads to 3 possible > > proposals for handling repr: > > > > 1) Do what is done now, always print a string that when > > eval'ed will recreate the array. > > > > 2) Only give summary information for the array regardless of > > its size. > > > > 3) Print the array if it has fewer than THRESHOLD number of > > elements, otherwise print a summary. THRESHOLD may be adjusted > > by the user. > > > > The last appears to be the most utilitarian to us, yet > > 'impure' somehow. Certainly there are may objects for which > > I vote for number 3, and have no hang-ups about any real or perceived > "impurity". This is an issue that I deal with daily. > > Scott > > > -- > Scott M. Ransom Address: McGill Univ. Physics Dept. > Phone: (514) 398-6492 3600 University St., Rm 338 > email: ra...@ph... Montreal, QC Canada H3A 2T8 > GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 > > _______________________________________________________________ > > Multimillion Dollar Computer Inventory > Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar > > > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Konrad H. <hi...@cn...> - 2002-06-11 19:26:33
|
"Perry Greenfield" <pe...@st...> writes: > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > > The last appears to be the most utilitarian to us, yet > 'impure' somehow. Certainly there are may objects for which > Python does not attempt to generate a string from repr that > could be used with eval to recreate them. On the other hand, > we are unaware of cases where repr sometimes does and sometimes I don't see the problem. The documented behaviour would be that it doesn't allow reconstruction. If for some arrays that works nevertheless, who is going to complain? BTW, it would be nice if the summary would contain the values of some elements, to allow a quick identification of NaN arrays and similar problems. > does not. For example, strings may also get very large, but > there is no threshold for generating the string. Right. But in practice strings rarely do get that large. Arrays do. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Greg B. <gb...@cf...> - 2002-06-11 19:24:16
|
> 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. I vote for 3) too. Especially annoying is when I mistakenly type a.shape instead of a.shape() interactively. Without the parentheses I get a bound method, the repr of which includes the repr for the whole array, and when this has > 25 million elements it really is a drag to wait for it all to finish spewing out... Getting sidetracked... is this repr of methods a feature? >>> l = [1,2,3,4] >>> l.sort <built-in method sort of list object at 0x402d4eec> >>> a = numarray.array(l) >>> a.shape <bound method NumArray.shape of array([1, 2, 3, 4])> It would seem more pythonic to get <bound method NumArray.shape of array object at 0x402d4bcc> or similar? -- Greg Ball |
From: Konrad H. <hi...@cn...> - 2002-06-11 19:15:19
|
> I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. It is an issue in much of my code, which contains stuff written with NumPy in mind as well as code using only standard Python operations (i.e. reduce()) which might however be applied to array objects. I also use arrays and nested lists interchangeably in many situations (NumPy functions accept nested lists instead of array arguments). Especially in interactive use, nested lists are easier to type. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with True, but this affects much fewer programs. Most of my code never modifies arrays after their creation, and then the difference in indexing behaviour doesn't matter. > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > arrays are inherently different and used differently than lists of lists As I said, I often use one or the other as a matter of convenience. I have always considered them similar types with somewhat different specialized behaviour. The most common situation is building up some table with lists (making use of the append function) and then converting the final construct into an array or not, depending on whether this seems advantageous. > in Python. Further, reduce() is a "corner" of the Python language that > has been superceded by list comprehensions. Choosing an alternative List comprehensions work in exactly the same way, by looping over the outermost index. > > 2) Minimization of code breakage. > > Fixes will be necessary for sure, and I wish that wasn't the case. They > will be necessary if we choose a consistent interface in either case. The current interface is not inconsistent. It follows a different logic than what some users expect, but there is a logic behind it. The current rules are the result of lengthy discussions and lengthy tests, though admittedly by a rather small group of people. If you arrange your arrays according to that logic, you almost never need to specify explicit axis arguments. > Choosing axis=0 or axis=-1 will not change what needs to be fixed -- > only the function names searched for. I disagree very much here. The fewer calls are concerned, the fewer mistakes are made, and the fewer modules have to be modified at all. Moreover, the functions that currently use axis=1 are more specialized and more likely to be called in similar contexts. They are also, in my limited experience, less often called with nested list arguments. I don't expect fixes to be as easy as searching for function names and adding an axis argument. Python is a very dynamic language, in which functions are objects like all others. They can be passed as arguments, stored in dictionaries and lists, assigned to variables, etc. In fact, instead of modifying any code, I'd rather write an interface module that emulates the old behaviour, which after all differs only in the default for one argument. The problem with this is that it adds another function call layer, which is rather expensive in Python. Which makes me wonder why we need this discussion at all. It is almost no extra effort to provide two different C modules that provide the same functions with different default arguments, and neither one needs to have any speed penalty. > True. But I can tell you that we're definitely doing something wrong > now. We have a superior language that is easier to integrate with > legacy code and less expensive than the best competing alternatives. > And, though I haven't done a serious market survey, I feel safe in > saying we have significantly less than 1% of the potential user base. I agree with that. But has anyone ever made a serious effort to find out why the whole world is not using Python? In my environment (which is too small to be representative for anything), the main reason is inertia. Most people don't want to invest any time to learn any new language, no matter what the advantages are (they remain hypothetical until you actually start to use the new language). I don't know anyone who has started to use Python and then dropped it because he was not satisfied with some aspect of the language or a library module. On the other hand, I do know projects that collapsed after a split in the user community due to some disagreement over minor details. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Scott R. <ra...@ph...> - 2002-06-11 18:53:50
|
On June 11, 2002 02:43 pm, Perry Greenfield wrote: > Yet on the other hand, it is undeniably convenient to use > repr (by typing a variable) for small arrays interactively > rather than using a print statement. This leads to 3 possible > proposals for handling repr: > > 1) Do what is done now, always print a string that when > eval'ed will recreate the array. > > 2) Only give summary information for the array regardless of > its size. > > 3) Print the array if it has fewer than THRESHOLD number of > elements, otherwise print a summary. THRESHOLD may be adjusted > by the user. > > The last appears to be the most utilitarian to us, yet > 'impure' somehow. Certainly there are may objects for which I vote for number 3, and have no hang-ups about any real or perceived "impurity". This is an issue that I deal with daily. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ra...@ph... Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |
From: Perry G. <pe...@st...> - 2002-06-11 18:52:07
|
<Eric Jones writes>: <Konrad Hinsen writes>: > > What needs to be improved in that area? > > Comparisons of complex numbers. But lets save that debate for later. > No, no, let's do it now. ;-) We for one would like to know for numarray what should be done. If I might be presumptious enough to anticipate what Eric would say, it is that complex comparisons should be allowed, and that they use all the information in the complex number (real and imaginary) so that they lead to consistent results in sorting. But the purist argues that comparisons for complex numbers are meaningless. Well, yes, but there are cases in code where you don't which such comparisons to cause an exception. But even more important, there is at least one case which is practical. It isn't all that uncommon to want to eliminate duplicate values from arrays, and one would like to be able to do that for complex values as well. A common technique is to sort the values and then eliminate all identical adjacent values. A predictable comparison rule would allow that to be easily implemented. Eric, am I missing anything in this? It should be obvious that we agree with his position, but I am wondering if there are any arguments we have not heard yet that outweigh the advantages we see. Perry |
From: Perry G. <pe...@st...> - 2002-06-11 18:43:58
|
While I'm flooding the mailing list with interface issues, I thought I would air another one (again, for numarray only). We've had some people internally complain that it does not make sense for repr to always generate a string capable of reconstructing the array. We often (usually) deal with multi-megabyte arrays. Typing a variable interactively for one of these arrays is invariably nonsensical. In such cases the user would be much better served by a message indicating the size, shape, type, etc. of the array than all of its contents. Yet on the other hand, it is undeniably convenient to use repr (by typing a variable) for small arrays interactively rather than using a print statement. This leads to 3 possible proposals for handling repr: 1) Do what is done now, always print a string that when eval'ed will recreate the array. 2) Only give summary information for the array regardless of its size. 3) Print the array if it has fewer than THRESHOLD number of elements, otherwise print a summary. THRESHOLD may be adjusted by the user. The last appears to be the most utilitarian to us, yet 'impure' somehow. Certainly there are may objects for which Python does not attempt to generate a string from repr that could be used with eval to recreate them. On the other hand, we are unaware of cases where repr sometimes does and sometimes does not. For example, strings may also get very large, but there is no threshold for generating the string. What do people think the most desirable solution? Keep in mind we intend to develop very efficient functions that will convert arrays to and from ascii representations (currently most of that code is in Python and quite slow in numarray at the moment) so it will not be necessary to use repr for this purpose. Only a few more issues to go, hopefully... Perry |
From: eric j. <er...@en...> - 2002-06-11 18:37:32
|
> From: Perry Greenfield [mailto:pe...@st...] > <Eric Jones wrote>: > > > Travis seemed to indicate that the Python would convert 0-d arrays to > > Python types correctly for most (all?) cases. Python indexing is a > > little unique because it explicitly requires integers. It's not just 0-d > > arrays that fail as indexes -- Python floats won't work either. > > > That's right, the primary breakage would be downstream use as > indices. That appeared to be the case with the find() method > of strings for example. > > > Yes, this would be required for using them as array indexes. Or > > actually: > > > > >>> a[int(x[2])] > > > Yes, this would be sufficient for use as indices or slices. I'm not > sure if there is any specific code that checks for float but doesn't > invoke automatic conversion. I suspect that floats are much less of > a problem this way, though will one necessarily know whether to use > int(), float(), or scalar()? If one is writing a generic function that > could accept int or float arrays then the generation of a int may > be overpresuming what the result will be used for. (Though I don't > have a particular example to give, I'll think about whether any > exist). If the only type that could possibly cause problems is int, > then int() should be all that would be necessary, but still awkward. If numarray becomes a first class citizen in the Python world as is hoped, maybe even this issue can be rectified. List/tuple indexing might be able to be changed to accept single element Integer arrays. I suspect this has major implications though -- probably a question for python-dev. eric |
From: Perry G. <pe...@st...> - 2002-06-11 18:06:20
|
<Eric Jones wrote>: > Travis seemed to indicate that the Python would convert 0-d arrays to > Python types correctly for most (all?) cases. Python indexing is a > little unique because it explicitly requires integers. It's not just 0-d > arrays that fail as indexes -- Python floats won't work either. > That's right, the primary breakage would be downstream use as indices. That appeared to be the case with the find() method of strings for example. > Yes, this would be required for using them as array indexes. Or > actually: > > >>> a[int(x[2])] > Yes, this would be sufficient for use as indices or slices. I'm not sure if there is any specific code that checks for float but doesn't invoke automatic conversion. I suspect that floats are much less of a problem this way, though will one necessarily know whether to use int(), float(), or scalar()? If one is writing a generic function that could accept int or float arrays then the generation of a int may be overpresuming what the result will be used for. (Though I don't have a particular example to give, I'll think about whether any exist). If the only type that could possibly cause problems is int, then int() should be all that would be necessary, but still awkward. Perry |
From: eric j. <er...@en...> - 2002-06-11 17:44:04
|
> "eric jones" <er...@en...> writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also between > that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't replace > it by a sum across the last axis just because that is faster. The default axis choice influences how people choose to lay out their data in arrays. If the default is to sum down columns, then users lay out their data so that this is the order of computation. This results in strided operations. There are cases where you need to reduce over multiple data sets, etc. which is what the axis=? flag is for. But choosing the default to also be the most efficient just makes sense. The cost is even higher for wrappers around C libraries not written explicitly for Python (which is most of them), because you have to re-order the memory before passing the variables into the C loop. Of course, the axis=0 is faster for Fortran libraries with wrappers that are smart enough to recognize this (Pearu's f2py wrapped libraries now recognize this sort of thing). However, the marriage to C is more important as future growth will come in this area more than Fortran. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > > about 10 functions using axis=-1. To this day, I can't remember which > > If you weight by frequency of usage, the first group gains a lot in > importance. I just scanned through some of my code; almost all of the > calls to Numeric routines are to functions whose default axis > is zero. Right, but I think all the reduce operators (sum, product, etc.) should have been axis=-1 in the first place. > > > code. Unfortunately, many of the Numeric functions that should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I suppose no one > objects to that. Sounds like Travis already did it. Thanks. > > > My vote is for keeping axis defaults as they are, both because the > choices are reasonable (there was a long discussion about them in the > early days of NumPy, and the defaults were chosen based on other array > languages that had already been in use for years) and because any > change would cause most existing NumPy code to break in many places, > often giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for axis=0, > for two reasons: > 1) Consistency with Python usage. I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency. I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative behavior that is generally better for array operations, as in the case of slices as views, is worth the change. > 2) Minimization of code breakage. Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case. Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for. > > > > We should also strive to make it as easy as possible to write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? Comparisons of complex numbers. But lets save that debate for later. > > > Changes are going to create some backward incompatibilities and that is > > definitely a bummer. But some changes are also necessary before the > > community gets big. I know the community is already reasonable size, > > I'd like to see evidence that changing the current NumPy behaviour > would increase the size of the community. It would first of all split > the current community, because many users (like myself) do not have > enough time to spare to go through their code line by line in order to > check for incompatibilities. That many others would switch to Python > if only some changes were made is merely an hypothesis. True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base. Even in communities where Python is relatively prevalent like astronomy, I would bet the every-day user base is less than 5% of the whole. There are a lot of holes to fill (graphics, comprehensive libraries, etc.) before we get up to the capabilities and quality of user interface that these tools have. Some of the interfaces problems are GUI and debugger related. Others are API related. Inconsistency in a library interface makes it harder to learn and is a wart. Whether it is as important as a graphics library? Probably not. But while we're building the next generation tool, we should fix things that make people wonder "why did they do this?". It is rarely a single thing that makes all the difference to a prospective user switching over. It is the overall quality of the tool that will sway them. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There are > > > good arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data layout > is not significant for most Python array operations. We might > for example offer a choice of C style and Fortran style data layout, > enabling users to choose according to speed, compatibility, or > just personal preference. In a way, as Pearu has shown in f2py, this is already possible by jiggering the stride and dimension entries, so this doesn't even require a change to the array descriptor (I don't think...). We could supply functions that returned a Fortran layout array. This would be beneficial for some applications outside of what we're discussing now that use Fortran extensions heavily. As long as it is transparent to the extension writer (which I think it can be) it sounds fine. I think the default constructor should return a C layout array though, and will be what 99% of the users will use. eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hi...@cn... > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- |
From: Paul F D. <pa...@pf...> - 2002-06-11 15:28:03
|
Konrad's arguments are also very good. I guess there was a good reason we did all that arguing before -- another issue where there is a Perl-like "more than one way to do it" quandry. I think in my own coding reduction on the first dimension is the most frequent. > -----Original Message----- > From: num...@li... > [mailto:num...@li...] On > Behalf Of Konrad Hinsen > Sent: Tuesday, June 11, 2002 6:12 AM > To: eric jones > Cc: 'Perry Greenfield'; num...@li... > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > "eric jones" <er...@en...> writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also > between that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't > replace it by a sum across the last axis just because that is faster. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, > counting FFT, > > about 10 functions using axis=-1. To this day, I can't > remember which > > If you weight by frequency of usage, the first group gains a > lot in importance. I just scanned through some of my code; > almost all of the calls to Numeric routines are to functions > whose default axis is zero. > > > code. Unfortunately, many of the Numeric functions that > should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I > suppose no one objects to that. > > > My vote is for keeping axis defaults as they are, both > because the choices are reasonable (there was a long > discussion about them in the early days of NumPy, and the > defaults were chosen based on other array languages that had > already been in use for years) and because any change would > cause most existing NumPy code to break in many places, often > giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for > axis=0, for two reasons: > 1) Consistency with Python usage. > 2) Minimization of code breakage. > > > > We should also strive to make it as easy as possible to > write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? > > > Changes are going to create some backward incompatibilities > and that > > is definitely a bummer. But some changes are also necessary before > > the community gets big. I know the community is already reasonable > > size, > > I'd like to see evidence that changing the current NumPy > behaviour would increase the size of the community. It would > first of all split the current community, because many users > (like myself) do not have enough time to spare to go through > their code line by line in order to check for > incompatibilities. That many others would switch to Python if > only some changes were made is merely an hypothesis. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There > are good > > > arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data > layout is not significant for most Python array operations. > We might for example offer a choice of C style and Fortran > style data layout, enabling users to choose according to > speed, compatibility, or just personal preference. > > Konrad. > -- > -------------------------------------------------------------- > ----------------- > Konrad Hinsen | E-Mail: > hi...@cn... > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > -------------------------------------------------------------- > ----------------- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's > Conference August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > _______________________________________________ > Numpy-discussion mailing list Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Konrad H. <hi...@cn...> - 2002-06-11 13:16:46
|
"eric jones" <er...@en...> writes: > The issue here is both consistency across a library and speed. Consistency, fine. But not just within one package, also between that package and the language it is implemented in. Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster. > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero. > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the That is certainly something that should be fixed, and I suppose no one objects to that. My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message. If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage. 2) Minimization of code breakage. > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. What needs to be improved in that area? > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis. > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Konrad H. <hi...@cn...> - 2002-06-11 12:56:26
|
Travis Oliphant <oli...@ie...> writes: > Actually, the code in PyArg_ParseTuple asks the object it gets if it > knows how to be a float. 0-d arrays for some time have known how to be > Python floats. So, I do not think this error occurs as you've > described. Could you demonstrate this error? No, it seems gone indeed. I remember a lengthy battle due to this problem, but that was a long time ago. > The only exception to this that I've seen is the list indexing code > (probably for optimization purposes). There could be more places, but > I have not found them or heard of them. Even for indexing, I don't see the point. If you test for the int type and do conversion attempts only for non-ints, that shouldn't slow down normal usage at all. > have now. I'm quite supportive of never returning Python scalars from > Numeric array operations unless specifically requested (e.g. the > toscalar method). I suppose this would be easy to implement, right? Then why not do it in a test release and find out empirically how much code it breaks. > presumption based? If I encounter a Python object that I'm unfamiliar > with, I don't presume to know how it will define multiplication. But if that object pretends to be a number type, a sequence type, a mapping type, etc., I do make assumptions about its behaviour. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Paul F D. <pa...@pf...> - 2002-06-11 03:56:12
|
I guess the argument for uniformity is pretty persuasive after all. (I know, I don't fit in on the Net, you can change my mind). Actually, don't we have a quick and dirty out here? Suppose we make the more uniform choice for Numarray, and then make a new module, say NumericCompatibility, which defines aliases to everything in Numarray that is the same as Numeric and then for the rest defines functions with the same names but the Numeric defaults, implemented by calling the ones in Numarray. Then changing "import Numeric" to "import NumericCompatibility as Numeric" ought to be enough to get someone working or close to working again. Someone posted something about "retrofitting" stuff from Numarray to Numeric. I cannot say strongly enough that I oppose this. Numeric itself must be frozen asap and eliminated eventually or there is no point to having developed a replacement that is easier to expand and maintain. We would have just doubled our workload for nothing. |
From: Travis O. <oli...@ie...> - 2002-06-11 03:51:25
|
On Mon, 2002-06-10 at 19:55, Scott Ransom wrote: > I have to admit that I agree with all of what Eric has to say > here -- even if it does cause some code breakage (I'm certainly > willing to do some maintenance on my code/modules that are > floating here and there so long as things continue to improve > with the language as a whole). I'm generally of the same opinion. > > I do think consistency is a very important aspect of getting > Numeric/Numarray accepted by a larger user base (and believe > me, my colaborators are probably sick of my Numeric Python > evangelism (but I like to think also a bit jealous of my NumPy > usage as they continue struggling with one-off C and Fortran > routines...)). > Another important factor is the support libraries. I know that something like Simulink (Matlab) is important to many of my colleagues in engineering. Simulink is the Mathworks version of visual programming which lets the user create a circuit visually which is then processed. I believe there was a good start to this sort of thing presented at the last Python Conference which was very encouraging. Other colleagues require something like a compiler to get C-code which will compile on a DSP board from a script and/or design session. I believe something like this would be very beneficial. > Another example of a glaring inconsistency in the current > implementation is this little number that has been bugging me > for awhile: > > >>> arange(10, typecode='d') > array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) > >>> ones(10, typecode='d') > array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) > >>> zeros(10, typecode='d') > Traceback (most recent call last): > File "<stdin>", line 1, in ? > TypeError: an integer is required > >>> zeros(10, 'd') > array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) > This is now fixed in cvs, along with other keyword problems. The ufunc methods reduce and accumulate also now take a keyword argument in CVS. -Travis |
From: Paul F D. <pa...@pf...> - 2002-06-11 03:19:42
|
It is time to choose the next "head nummie", the chair of the set of sourceforge developers for Numerical Python. Now is an apt time since I will be changing assignments at LLNL in August to one which has less daily use of numpy. We have no procedure for doing this other than for us nummies to come to a consensus amongst ourselves, with the input of the Numpy community. After I return from Europython I hope we can make a selection during the first two weeks of July. |
From: Scott R. <ra...@ph...> - 2002-06-11 01:56:00
|
I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile: >>> arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> zeros(10, typecode='d') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required >>> zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote: > So one contentious issue a day isn't enough, huh? :-) > > > An issue that has been raised by scipy (most notably Eric Jones > > and Travis Oliphant) has been whether the default axis used by > > various functions should be changed from the current Numeric > > default. This message is not directed at determining whether we > > should change the current Numeric behavior for Numeric, but whether > > numarray should adopt the same behavior as the current Numeric. > > > > To be more specific, certain functions and methods, such as > > add.reduce(), operate by default on the first axis. For example, > > if x is a 2 x 10 array, then add.reduce(x) results in a > > 10 element array, where elements in the first dimension has > > been summed over rather than the most rapidly varying dimension. > > > > >>> x = arange(20) > > >>> x.shape = (2,10) > > >>> x > > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > > >>> add.reduce(x) > > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) > > The issue here is both consistency across a library and speed. > > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which > functions use which and have resorted to explicitly using axis=-1 in my > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the > argument list (but this is a different issue -- it just needs to be > fixed). > > SciPy always uses axis=-1 for operations. There are 60+ functions with > this convention. Choosing -1 offers the best cache use and therefore > should be more efficient. Defaulting to the fastest behavior is > convenient because new users don't need any special knowledge of > Numeric's implementation to get near peak performance. Also, there is > never a question about which axis is used for calculations. > > When using SciPy and Numeric, their function sets are completely > co-mingled. When adding SciPy and Numeric's function counts together, > it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a > standard, it is impossible for the interface to become intuitive because > of the exceptions to the rule from Numeric. > > So here what I think. All functions should default to the same axis so > that the interface to common functions can become second nature for new > users and experts alike. Further, the chosen axis should be the most > efficient for the most cases. > > There are actually a few functions that, taken in isolation, I think > should have axis=0. take() is an example. But, for the sake of > consistency, it too should use axis=-1. > > It has been suggested to recommend that new users always specify axis=? > as a keyword in functions that require an axis argument. This might be > fine when writing modules, but always having to type: > > >>> sum(a,axis=-1) > > in command line mode is a real pain. > > Just a point about the larger picture here... The changes we're > discussing are intended to clean up the warts on Numeric -- and, as good > as it is overall, these are warts in terms of usability. Interfaces > should be consistent across a library. The return types from functions > should be consistent regardless of input type (or shape). Default > arguments to the same keyword should also be consistent across > functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 > as default, returning arrays or scalars from Numeric functions and > indexing), but the choice made should be applied as consistently as > possible. > > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. > > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, > but I also believe, based on the strength of Python, Numeric, and > libraries such as Scientific and SciPy, the community can grow by 2 > orders of magnitude over the next five years. This kind of growth can't > occur if only savvy developers see the benefits of the elegant language. > It can only occur if the general scientist see Python as a compelling > alternative to Matlab (and IDL) as their day-in/day-out command line > environment for scientific/engineering analysis. Making the interface > consistent is one of several steps to making Python more attractive to > this community. > > Whether the changes made for numarray should be migrated back into > Numeric is an open question. I think they should, but see Konrad's > counterpoint. I'm willing for SciPy to be the intermediate step in the > migration between the two, but also think that is sub-optimal. > > > > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has > > argued that the current behavior is most compatible for behavior > > of other Python sequences. For example, > > > > >>> sum = 0 > > >>> for subarr in x: > > sum += subarr > > > > acts on the first axis in effect. Likewise > > > > >>> reduce(add, x) > > > > does likewise. In this sense, Numeric is currently more consistent > > with Python behavior. However, there are other functions that > > operate on the most rapidly varying dimension. Unfortunately > > I cannot currently access my old mail, but I think the rule > > that was proposed under this argument was that if the 'reduction' > > operation was of a structural kind, the first dimension is used. > > If the reduction or processing step is 'time-series' oriented > > (e.g., FFT, convolve) then the last dimension is the default. > > On the other hand, some feel it would be much simpler to understand > > if the last axis was the default always. > > > > The question is whether there is a consensus for one approach or > > the other. We raised this issue at a scientific Birds-of-a-Feather > > session at the last Python Conference. The sense I got there was > > that most were for the status quo, keeping the behavior as it is > > now. Is the same true here? In the absence of consensus or a > > convincing majority, we will keep the behavior the same for backward > > compatibility purposes. > > Obviously, I'm more opinionated about this now than I was then. I > really urge you to consider using axis=-1 everywhere. SciPy is not the > only scientific library, but I think it adds the most functions with a > similar signature (the stats module is full of them). I very much hope > for a consistent interface across all of Python's scientific functions > because command line users aren't going to care whether sum() and > kurtosis() come from different libraries, they just want them to behave > consistently. > > eric > > > > > Perry > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ra...@ph... Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |
From: eric j. <er...@en...> - 2002-06-10 23:15:41
|
So one contentious issue a day isn't enough, huh? :-) > An issue that has been raised by scipy (most notably Eric Jones > and Travis Oliphant) has been whether the default axis used by > various functions should be changed from the current Numeric > default. This message is not directed at determining whether we > should change the current Numeric behavior for Numeric, but whether > numarray should adopt the same behavior as the current Numeric. > > To be more specific, certain functions and methods, such as > add.reduce(), operate by default on the first axis. For example, > if x is a 2 x 10 array, then add.reduce(x) results in a > 10 element array, where elements in the first dimension has > been summed over rather than the most rapidly varying dimension. > > >>> x = arange(20) > >>> x.shape = (2,10) > >>> x > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > >>> add.reduce(x) > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) The issue here is both consistency across a library and speed. From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed). SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations. When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric. So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases. There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1. It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type: >>> sum(a,axis=-1) in command line mode is a real pain. Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible. We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come. Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community. Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal. > > Some feel that is contrary to expectations that the least rapidly > varying dimension should be operated on by default. There are > good arguments for both sides. For example, Konrad Hinsen has > argued that the current behavior is most compatible for behavior > of other Python sequences. For example, > > >>> sum = 0 > >>> for subarr in x: > sum += subarr > > acts on the first axis in effect. Likewise > > >>> reduce(add, x) > > does likewise. In this sense, Numeric is currently more consistent > with Python behavior. However, there are other functions that > operate on the most rapidly varying dimension. Unfortunately > I cannot currently access my old mail, but I think the rule > that was proposed under this argument was that if the 'reduction' > operation was of a structural kind, the first dimension is used. > If the reduction or processing step is 'time-series' oriented > (e.g., FFT, convolve) then the last dimension is the default. > On the other hand, some feel it would be much simpler to understand > if the last axis was the default always. > > The question is whether there is a consensus for one approach or > the other. We raised this issue at a scientific Birds-of-a-Feather > session at the last Python Conference. The sense I got there was > that most were for the status quo, keeping the behavior as it is > now. Is the same true here? In the absence of consensus or a > convincing majority, we will keep the behavior the same for backward > compatibility purposes. Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently. eric > > Perry |
From: Paul F D. <pa...@pf...> - 2002-06-10 23:05:14
|
> Konrad mentioned the tuple parsing issue in some > extension libraries that expects floats, but it sounds like > Travis thinks this is no longer an issue. Are there others? > > eric > Lots of code tries to distinguish cases using isinstance, and these tests will fail if given an array instance when they are testing for a float. |
From: eric j. <er...@en...> - 2002-06-10 21:26:45
|
> <Eric Jones writes>: > > I further believe that all Numeric functions (sum, product, etc.) should > > return arrays all the time instead of converting implicitly converting > > them to Python scalars in special cases such as reductions of 1d arrays. > > I think the only reason for the silent conversion is that Python lists > > only allow integer values for use in indexing so that: > > > > >>> a = [1,2,3,4] > > >>> a[array(0)] > > Traceback (most recent call last): > > File "<stdin>", line 1, in ? > > TypeError: sequence index must be integer > > > > Numeric arrays don't have this problem: > > > > >>> a = array([1,2,3,4]) > > >>> a[array(0)] > > 1 > > > > I don't think this alone is a strong enough reason for the conversion. > > Getting rid of special cases is more important because it makes behavior > > predictable to the novice (and expert), and it is easier to write > > generic functions and be sure they will not break a year from now when > > one of the special cases occurs. > > > > Are there other reasons why scalars are returned? > > > Well, sure. It isn't just indexing lists directly, it would be > anywhere in Python that you would use a number. Travis seemed to indicate that the Python would convert 0-d arrays to Python types correctly for most (all?) cases. Python indexing is a little unique because it explicitly requires integers. It's not just 0-d arrays that fail as indexes -- Python floats won't work either. As for passing arrays to functions expecting numbers, is it that much different than passing an integer into a function that does floating point operations? Python handles this casting automatically. It seems like is should do the same for 0-d arrays if they know how to "look like" Python types. > In some contexts, > the right thing may happen (where the function knows to try to obtain > a simple number from an object), but then again, it may not (if calling > a function where the number is used directly to index or slice). > > Here is another case where good arguments can be made for both > sides. It really isn't an issue of functionality (one can write > methods or functions to do what is needed), it's what the convenient > syntax does. For example, if we really want a Python scalar but > rank-0 arrays are always returned then something like this may > be required: > > >>> x = arange(10) > >>> a = range(10) > >>> a[scalar(x[2])] # instead of a[x[2]] Yes, this would be required for using them as array indexes. Or actually: >>> a[int(x[2])] > > Whereas if simple indexing returns a Python scalar and consistency > is desired in always having arrays returned one may have to do > something like this > > >>> y = x.indexAsArray(2) # instead of y = x[2] > > or perhaps > > >>> y = x[ArrayAlwaysAsResultIndexObject(2)] > # :-) with better name, of course > > One context or the other is going to be inconvenienced, but not > prevented from doing what is needed. Right. > > As long as Python scalars are the 'biggest' type of their kind, we > strongly lean towards single elements being converted into Python > scalars. It's our feeling that there are more surprises and gotchas, > particularly for more casual users, on this side than on the uncertainty > of an index returning an array or scalar. People writing code that > expects to deal with uncertain dimensionality (the only place that > this occurs) should be the ones to go the extra distance in more > awkward syntax. Well, I guess I'd like to figure out exactly what breaks before ruling it out because consistently returning the same type from functions/indexing is beneficial. It becomes even more beneficial with the exception behavior used by SciPy and numarray. The two breakage cases I'm aware of are (1) indexing and (2) functions that explicitly check for arguments of IntType, DoubleType, or ComplextType. When searching the standard library for these guys, they only turn up in copy, pickle, xmlrpclib, and the types module -- all in innocuous ways. Searching for 'float' (which is equal to FloatType) doesn't show up any code that breaks this either. A search of my site-packages had IntType tests used quite a bit -- primarily in SciPy. Some of these would go away with this change, and many were harmless. I saw a few that would need fixing (several in special.py), but the fix was trivial. eric |
From: Perry G. <pe...@st...> - 2002-06-10 20:36:12
|
An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric. To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension. >>> x = arange(20) >>> x.shape = (2,10) >>> x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) >>> add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example, >>> sum = 0 >>> for subarr in x: sum += subarr acts on the first axis in effect. Likewise >>> reduce(add, x) does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always. The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes. Perry |