From: Scott R. <ra...@ph...> - 2002-06-11 01:56:00
|
I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile: >>> arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> zeros(10, typecode='d') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required >>> zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote: > So one contentious issue a day isn't enough, huh? :-) > > > An issue that has been raised by scipy (most notably Eric Jones > > and Travis Oliphant) has been whether the default axis used by > > various functions should be changed from the current Numeric > > default. This message is not directed at determining whether we > > should change the current Numeric behavior for Numeric, but whether > > numarray should adopt the same behavior as the current Numeric. > > > > To be more specific, certain functions and methods, such as > > add.reduce(), operate by default on the first axis. For example, > > if x is a 2 x 10 array, then add.reduce(x) results in a > > 10 element array, where elements in the first dimension has > > been summed over rather than the most rapidly varying dimension. > > > > >>> x = arange(20) > > >>> x.shape = (2,10) > > >>> x > > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > > >>> add.reduce(x) > > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) > > The issue here is both consistency across a library and speed. > > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which > functions use which and have resorted to explicitly using axis=-1 in my > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the > argument list (but this is a different issue -- it just needs to be > fixed). > > SciPy always uses axis=-1 for operations. There are 60+ functions with > this convention. Choosing -1 offers the best cache use and therefore > should be more efficient. Defaulting to the fastest behavior is > convenient because new users don't need any special knowledge of > Numeric's implementation to get near peak performance. Also, there is > never a question about which axis is used for calculations. > > When using SciPy and Numeric, their function sets are completely > co-mingled. When adding SciPy and Numeric's function counts together, > it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a > standard, it is impossible for the interface to become intuitive because > of the exceptions to the rule from Numeric. > > So here what I think. All functions should default to the same axis so > that the interface to common functions can become second nature for new > users and experts alike. Further, the chosen axis should be the most > efficient for the most cases. > > There are actually a few functions that, taken in isolation, I think > should have axis=0. take() is an example. But, for the sake of > consistency, it too should use axis=-1. > > It has been suggested to recommend that new users always specify axis=? > as a keyword in functions that require an axis argument. This might be > fine when writing modules, but always having to type: > > >>> sum(a,axis=-1) > > in command line mode is a real pain. > > Just a point about the larger picture here... The changes we're > discussing are intended to clean up the warts on Numeric -- and, as good > as it is overall, these are warts in terms of usability. Interfaces > should be consistent across a library. The return types from functions > should be consistent regardless of input type (or shape). Default > arguments to the same keyword should also be consistent across > functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 > as default, returning arrays or scalars from Numeric functions and > indexing), but the choice made should be applied as consistently as > possible. > > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. > > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, > but I also believe, based on the strength of Python, Numeric, and > libraries such as Scientific and SciPy, the community can grow by 2 > orders of magnitude over the next five years. This kind of growth can't > occur if only savvy developers see the benefits of the elegant language. > It can only occur if the general scientist see Python as a compelling > alternative to Matlab (and IDL) as their day-in/day-out command line > environment for scientific/engineering analysis. Making the interface > consistent is one of several steps to making Python more attractive to > this community. > > Whether the changes made for numarray should be migrated back into > Numeric is an open question. I think they should, but see Konrad's > counterpoint. I'm willing for SciPy to be the intermediate step in the > migration between the two, but also think that is sub-optimal. > > > > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has > > argued that the current behavior is most compatible for behavior > > of other Python sequences. For example, > > > > >>> sum = 0 > > >>> for subarr in x: > > sum += subarr > > > > acts on the first axis in effect. Likewise > > > > >>> reduce(add, x) > > > > does likewise. In this sense, Numeric is currently more consistent > > with Python behavior. However, there are other functions that > > operate on the most rapidly varying dimension. Unfortunately > > I cannot currently access my old mail, but I think the rule > > that was proposed under this argument was that if the 'reduction' > > operation was of a structural kind, the first dimension is used. > > If the reduction or processing step is 'time-series' oriented > > (e.g., FFT, convolve) then the last dimension is the default. > > On the other hand, some feel it would be much simpler to understand > > if the last axis was the default always. > > > > The question is whether there is a consensus for one approach or > > the other. We raised this issue at a scientific Birds-of-a-Feather > > session at the last Python Conference. The sense I got there was > > that most were for the status quo, keeping the behavior as it is > > now. Is the same true here? In the absence of consensus or a > > convincing majority, we will keep the behavior the same for backward > > compatibility purposes. > > Obviously, I'm more opinionated about this now than I was then. I > really urge you to consider using axis=-1 everywhere. SciPy is not the > only scientific library, but I think it adds the most functions with a > similar signature (the stats module is full of them). I very much hope > for a consistent interface across all of Python's scientific functions > because command line users aren't going to care whether sum() and > kurtosis() come from different libraries, they just want them to behave > consistently. > > eric > > > > > Perry > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ra...@ph... Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |