From: Perry G. <pe...@st...> - 2002-06-10 20:36:12
|
An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric. To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension. >>> x = arange(20) >>> x.shape = (2,10) >>> x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) >>> add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example, >>> sum = 0 >>> for subarr in x: sum += subarr acts on the first axis in effect. Likewise >>> reduce(add, x) does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always. The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes. Perry |
From: eric j. <er...@en...> - 2002-06-10 23:15:41
|
So one contentious issue a day isn't enough, huh? :-) > An issue that has been raised by scipy (most notably Eric Jones > and Travis Oliphant) has been whether the default axis used by > various functions should be changed from the current Numeric > default. This message is not directed at determining whether we > should change the current Numeric behavior for Numeric, but whether > numarray should adopt the same behavior as the current Numeric. > > To be more specific, certain functions and methods, such as > add.reduce(), operate by default on the first axis. For example, > if x is a 2 x 10 array, then add.reduce(x) results in a > 10 element array, where elements in the first dimension has > been summed over rather than the most rapidly varying dimension. > > >>> x = arange(20) > >>> x.shape = (2,10) > >>> x > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > >>> add.reduce(x) > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) The issue here is both consistency across a library and speed. From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed). SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations. When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric. So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases. There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1. It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type: >>> sum(a,axis=-1) in command line mode is a real pain. Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible. We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come. Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community. Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal. > > Some feel that is contrary to expectations that the least rapidly > varying dimension should be operated on by default. There are > good arguments for both sides. For example, Konrad Hinsen has > argued that the current behavior is most compatible for behavior > of other Python sequences. For example, > > >>> sum = 0 > >>> for subarr in x: > sum += subarr > > acts on the first axis in effect. Likewise > > >>> reduce(add, x) > > does likewise. In this sense, Numeric is currently more consistent > with Python behavior. However, there are other functions that > operate on the most rapidly varying dimension. Unfortunately > I cannot currently access my old mail, but I think the rule > that was proposed under this argument was that if the 'reduction' > operation was of a structural kind, the first dimension is used. > If the reduction or processing step is 'time-series' oriented > (e.g., FFT, convolve) then the last dimension is the default. > On the other hand, some feel it would be much simpler to understand > if the last axis was the default always. > > The question is whether there is a consensus for one approach or > the other. We raised this issue at a scientific Birds-of-a-Feather > session at the last Python Conference. The sense I got there was > that most were for the status quo, keeping the behavior as it is > now. Is the same true here? In the absence of consensus or a > convincing majority, we will keep the behavior the same for backward > compatibility purposes. Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently. eric > > Perry |
From: Konrad H. <hi...@cn...> - 2002-06-11 13:16:46
|
"eric jones" <er...@en...> writes: > The issue here is both consistency across a library and speed. Consistency, fine. But not just within one package, also between that package and the language it is implemented in. Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster. > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero. > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the That is certainly something that should be fixed, and I suppose no one objects to that. My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message. If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage. 2) Minimization of code breakage. > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. What needs to be improved in that area? > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis. > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Paul F D. <pa...@pf...> - 2002-06-11 15:28:03
|
Konrad's arguments are also very good. I guess there was a good reason we did all that arguing before -- another issue where there is a Perl-like "more than one way to do it" quandry. I think in my own coding reduction on the first dimension is the most frequent. > -----Original Message----- > From: num...@li... > [mailto:num...@li...] On > Behalf Of Konrad Hinsen > Sent: Tuesday, June 11, 2002 6:12 AM > To: eric jones > Cc: 'Perry Greenfield'; num...@li... > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > "eric jones" <er...@en...> writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also > between that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't > replace it by a sum across the last axis just because that is faster. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, > counting FFT, > > about 10 functions using axis=-1. To this day, I can't > remember which > > If you weight by frequency of usage, the first group gains a > lot in importance. I just scanned through some of my code; > almost all of the calls to Numeric routines are to functions > whose default axis is zero. > > > code. Unfortunately, many of the Numeric functions that > should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I > suppose no one objects to that. > > > My vote is for keeping axis defaults as they are, both > because the choices are reasonable (there was a long > discussion about them in the early days of NumPy, and the > defaults were chosen based on other array languages that had > already been in use for years) and because any change would > cause most existing NumPy code to break in many places, often > giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for > axis=0, for two reasons: > 1) Consistency with Python usage. > 2) Minimization of code breakage. > > > > We should also strive to make it as easy as possible to > write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? > > > Changes are going to create some backward incompatibilities > and that > > is definitely a bummer. But some changes are also necessary before > > the community gets big. I know the community is already reasonable > > size, > > I'd like to see evidence that changing the current NumPy > behaviour would increase the size of the community. It would > first of all split the current community, because many users > (like myself) do not have enough time to spare to go through > their code line by line in order to check for > incompatibilities. That many others would switch to Python if > only some changes were made is merely an hypothesis. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There > are good > > > arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data > layout is not significant for most Python array operations. > We might for example offer a choice of C style and Fortran > style data layout, enabling users to choose according to > speed, compatibility, or just personal preference. > > Konrad. > -- > -------------------------------------------------------------- > ----------------- > Konrad Hinsen | E-Mail: > hi...@cn... > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > -------------------------------------------------------------- > ----------------- > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's > Conference August 25-28 in Las Vegas - > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink > > > _______________________________________________ > Numpy-discussion mailing list Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: eric j. <er...@en...> - 2002-06-11 17:44:04
|
> "eric jones" <er...@en...> writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also between > that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't replace > it by a sum across the last axis just because that is faster. The default axis choice influences how people choose to lay out their data in arrays. If the default is to sum down columns, then users lay out their data so that this is the order of computation. This results in strided operations. There are cases where you need to reduce over multiple data sets, etc. which is what the axis=? flag is for. But choosing the default to also be the most efficient just makes sense. The cost is even higher for wrappers around C libraries not written explicitly for Python (which is most of them), because you have to re-order the memory before passing the variables into the C loop. Of course, the axis=0 is faster for Fortran libraries with wrappers that are smart enough to recognize this (Pearu's f2py wrapped libraries now recognize this sort of thing). However, the marriage to C is more important as future growth will come in this area more than Fortran. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > > about 10 functions using axis=-1. To this day, I can't remember which > > If you weight by frequency of usage, the first group gains a lot in > importance. I just scanned through some of my code; almost all of the > calls to Numeric routines are to functions whose default axis > is zero. Right, but I think all the reduce operators (sum, product, etc.) should have been axis=-1 in the first place. > > > code. Unfortunately, many of the Numeric functions that should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I suppose no one > objects to that. Sounds like Travis already did it. Thanks. > > > My vote is for keeping axis defaults as they are, both because the > choices are reasonable (there was a long discussion about them in the > early days of NumPy, and the defaults were chosen based on other array > languages that had already been in use for years) and because any > change would cause most existing NumPy code to break in many places, > often giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for axis=0, > for two reasons: > 1) Consistency with Python usage. I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency. I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative behavior that is generally better for array operations, as in the case of slices as views, is worth the change. > 2) Minimization of code breakage. Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case. Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for. > > > > We should also strive to make it as easy as possible to write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? Comparisons of complex numbers. But lets save that debate for later. > > > Changes are going to create some backward incompatibilities and that is > > definitely a bummer. But some changes are also necessary before the > > community gets big. I know the community is already reasonable size, > > I'd like to see evidence that changing the current NumPy behaviour > would increase the size of the community. It would first of all split > the current community, because many users (like myself) do not have > enough time to spare to go through their code line by line in order to > check for incompatibilities. That many others would switch to Python > if only some changes were made is merely an hypothesis. True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base. Even in communities where Python is relatively prevalent like astronomy, I would bet the every-day user base is less than 5% of the whole. There are a lot of holes to fill (graphics, comprehensive libraries, etc.) before we get up to the capabilities and quality of user interface that these tools have. Some of the interfaces problems are GUI and debugger related. Others are API related. Inconsistency in a library interface makes it harder to learn and is a wart. Whether it is as important as a graphics library? Probably not. But while we're building the next generation tool, we should fix things that make people wonder "why did they do this?". It is rarely a single thing that makes all the difference to a prospective user switching over. It is the overall quality of the tool that will sway them. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There are > > > good arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data layout > is not significant for most Python array operations. We might > for example offer a choice of C style and Fortran style data layout, > enabling users to choose according to speed, compatibility, or > just personal preference. In a way, as Pearu has shown in f2py, this is already possible by jiggering the stride and dimension entries, so this doesn't even require a change to the array descriptor (I don't think...). We could supply functions that returned a Fortran layout array. This would be beneficial for some applications outside of what we're discussing now that use Fortran extensions heavily. As long as it is transparent to the extension writer (which I think it can be) it sounds fine. I think the default constructor should return a C layout array though, and will be what 99% of the users will use. eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hi...@cn... > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- |
From: Konrad H. <hi...@cn...> - 2002-06-11 19:15:19
|
> I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. It is an issue in much of my code, which contains stuff written with NumPy in mind as well as code using only standard Python operations (i.e. reduce()) which might however be applied to array objects. I also use arrays and nested lists interchangeably in many situations (NumPy functions accept nested lists instead of array arguments). Especially in interactive use, nested lists are easier to type. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with True, but this affects much fewer programs. Most of my code never modifies arrays after their creation, and then the difference in indexing behaviour doesn't matter. > I don't see choosing axis=-1 as a break with Python -- multi-dimensional > arrays are inherently different and used differently than lists of lists As I said, I often use one or the other as a matter of convenience. I have always considered them similar types with somewhat different specialized behaviour. The most common situation is building up some table with lists (making use of the append function) and then converting the final construct into an array or not, depending on whether this seems advantageous. > in Python. Further, reduce() is a "corner" of the Python language that > has been superceded by list comprehensions. Choosing an alternative List comprehensions work in exactly the same way, by looping over the outermost index. > > 2) Minimization of code breakage. > > Fixes will be necessary for sure, and I wish that wasn't the case. They > will be necessary if we choose a consistent interface in either case. The current interface is not inconsistent. It follows a different logic than what some users expect, but there is a logic behind it. The current rules are the result of lengthy discussions and lengthy tests, though admittedly by a rather small group of people. If you arrange your arrays according to that logic, you almost never need to specify explicit axis arguments. > Choosing axis=0 or axis=-1 will not change what needs to be fixed -- > only the function names searched for. I disagree very much here. The fewer calls are concerned, the fewer mistakes are made, and the fewer modules have to be modified at all. Moreover, the functions that currently use axis=1 are more specialized and more likely to be called in similar contexts. They are also, in my limited experience, less often called with nested list arguments. I don't expect fixes to be as easy as searching for function names and adding an axis argument. Python is a very dynamic language, in which functions are objects like all others. They can be passed as arguments, stored in dictionaries and lists, assigned to variables, etc. In fact, instead of modifying any code, I'd rather write an interface module that emulates the old behaviour, which after all differs only in the default for one argument. The problem with this is that it adds another function call layer, which is rather expensive in Python. Which makes me wonder why we need this discussion at all. It is almost no extra effort to provide two different C modules that provide the same functions with different default arguments, and neither one needs to have any speed penalty. > True. But I can tell you that we're definitely doing something wrong > now. We have a superior language that is easier to integrate with > legacy code and less expensive than the best competing alternatives. > And, though I haven't done a serious market survey, I feel safe in > saying we have significantly less than 1% of the potential user base. I agree with that. But has anyone ever made a serious effort to find out why the whole world is not using Python? In my environment (which is too small to be representative for anything), the main reason is inertia. Most people don't want to invest any time to learn any new language, no matter what the advantages are (they remain hypothetical until you actually start to use the new language). I don't know anyone who has started to use Python and then dropped it because he was not satisfied with some aspect of the language or a library module. On the other hand, I do know projects that collapsed after a split in the user community due to some disagreement over minor details. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Paul B. <Ba...@st...> - 2002-06-12 15:54:57
|
eric jones wrote: > > I think the consistency with Python is less of an issue than it seems. > I wasn't aware that add.reduce(x) would generated the same results as > the Python version of reduce(add,x) until Perry pointed it out to me. > There are some inconsistencies between Python the language and Numeric > because the needs of the Numeric community. For instance, slices create > views instead of copies as in Python. This was a correct break with > consistency in a very utilized area of Python because of efficiency. <Begin Rant> I think consistency is an issue, particularly for novices. You cite the issue of slices creating views instead of copies as being the correct choice. But this decision is based solely on the perception that views are 'inherently' more efficient than copies and not on reasons of consistency or usability. I (a seasoned user) find view behavior to be annoying and have been caught out on this several times. For example, reversing in-place the elements of any array using slices, i.e. A = A[::-1], will give the wrong answer, unless you explicitly make a copy before doing the assignment. Whereas, copy behavior will do the right thing. I suggest that many novices will be caught out by this and similar examples, as I have been. Copy behavior for slices can be just as efficient as view behavior, if implemented as copy-on-write. The beauty of Python is that it allows the developer to spend much more time on consistency and usability issues than on implementation issues. Sadly, I think much of Numeric development is based solely on implementation issues to the detriment of consistency and usability. I don't have enough experience to definitely say whether axis=0 should be preferred over axis=-1 or vice versa. But is does appear that for the most general cases axis=0 is probably preferred. This is the default for the APL and J programming of which Numeric is based. Should we not continue to follow their lead? It might be nice to see a list of examples where axis=0 is the preferred default and the same for axis=-1. <End Rant> -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 |
From: Konrad H. <hi...@cn...> - 2002-06-12 18:10:34
|
Paul Barrett <Ba...@st...> writes: > I think consistency is an issue, particularly for novices. You cite ... Finally a contribution that I can fully agree with :-) > I don't have enough experience to definitely say whether axis=0 should > be preferred over axis=-1 or vice versa. But is does appear that for > the most general cases axis=0 is probably preferred. This is the > default for the APL and J programming of which Numeric is based. > Should we not continue to follow their lead? It might be nice to see This the internal logic I referred to briefly earlier, but I didn't have the time to explain it in more detail. Now I have :-) The basic idea is that an array is seen as an array of array values. The N dimensions are split into two parts, the first N1 dimensions describe the shape of the "total" array, and the remaining N2=N-N1 dimensions describe the shape of the array-valued elements of the array. I suppose some examples will help: - A rank-1 array could be seen either as a vector of scalars (N1 = 1) or as a scalar containing a vector (N1 = 0), in practice there is no difference between these views. - A rank-2 array could be seen as a matrix (N1=2), as a vector of vectors (N1=1) or as a scalar containing a matrix (N1=0). The first and the last come down to the same, but the middle one doesn't. - A discretized vector field (i.e. one 3D vector value for each point on a 3D grid) is represented by a rank-6 array, with N1=3 and N2=3. Array operations are divided into two classes, "structural" and "element" operations. Element operations do something on each individual element of an array, returning a new array with the same "outer" shape, although the element shape may be different. Structural operations work on the outer shape, returning a new array with a possibly different outer shape but the same element shape. The most frequent element operations are addition, multiplication, etc., which work on scalar elements only. They need no axis argument at all. Element operations that work on rank-1 elements have a default axis of -1, I think FFT has been quoted as an example a few times. There are no element operations that work on higher-rank elements, but they are imaginable. A 2D FFT routine would default to axis=-2. Structural operations, which are by far the most frequent after scalar element operations, default to axis=0. They include reduction and accumulation, sorting, selection (take, repeat, ...) and some others. I hope this clarifies the choice of default axis arguments in the current NumPy. It is most definitely not arbitrary or accidental. If you follow the data layout principles explained above, you always never need to specify an explicit axis argument. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Paul F D. <pa...@pf...> - 2002-06-12 22:42:28
|
The users of Numeric at PCMDI found the 'view' semantics so annoying that they insisted their CS staff write a separate version of Numeric just to avoid it. We have since gotten out of that mess but that is the reason MA has copy semantics. Again, this is another issue where one is fighting over the right to 'own' the operator notation. I believe that copy semantics should win this one because it is a **proven fact** that scientists trip over it, and it is consistent with Python list semantics. People who really need view semantics could get it as previously suggested by someone, with something like x.sub[10:12, :]. There are now dead horses all over the landscape, and I for one am going to shut up. > -----Original Message----- > From: num...@li... > [mailto:num...@li...] On > Behalf Of Paul Barrett > Sent: Wednesday, June 12, 2002 8:54 AM > To: numpy-discussion > Subject: Re: [Numpy-discussion] RE: default axis for numarray > > > eric jones wrote: > > > > > I think the consistency with Python is less of an issue > than it seems. > > I wasn't aware that add.reduce(x) would generated the same > results as > > the Python version of reduce(add,x) until Perry pointed it > out to me. > > There are some inconsistencies between Python the language > and Numeric > > because the needs of the Numeric community. For instance, slices > > create views instead of copies as in Python. This was a > correct break > > with consistency in a very utilized area of Python because > of efficiency. > > <Begin Rant> > > I think consistency is an issue, particularly for novices. > You cite the issue > of slices creating views instead of copies as being the > correct choice. But > this decision is based solely on the perception that views > are 'inherently' more > efficient than copies and not on reasons of consistency or > usability. I (a > seasoned user) find view behavior to be annoying and have > been caught out on > this several times. For example, reversing in-place the > elements of any array > using slices, i.e. A = A[::-1], will give the wrong answer, > unless you > explicitly make a copy before doing the assignment. Whereas, > copy behavior will > do the right thing. I suggest that many novices will be > caught out by this and > similar examples, as I have been. Copy behavior for slices > can be just as > efficient as view behavior, if implemented as copy-on-write. > > The beauty of Python is that it allows the developer to spend > much more time on > consistency and usability issues than on implementation > issues. Sadly, I think > much of Numeric development is based solely on implementation > issues to the > detriment of consistency and usability. > > I don't have enough experience to definitely say whether > axis=0 should be > preferred over axis=-1 or vice versa. But is does appear that > for the most > general cases axis=0 is probably preferred. This is the > default for the APL and > J programming of which Numeric is based. Should we not > continue to follow their > lead? It might be nice to see a list of examples where > axis=0 is the preferred > default and the same for axis=-1. > > <End Rant> > > > -- > Paul Barrett, PhD Space Telescope Science Institute > Phone: 410-338-4475 ESS/Science Software Group > FAX: 410-338-4767 Baltimore, MD 21218 > > > _______________________________________________________________ > > Sponsored by: > ThinkGeek at http://www.ThinkGeek.com/ > _______________________________________________ > Numpy-discussion mailing list Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Perry G. <pe...@st...> - 2002-06-13 20:22:45
|
<Paul Dubois writes>: > There are now dead horses all over the landscape, and I for one am going > to shut up. > Not enough dead horses for me :-). But seriously, I would like to hear from others about this issue (I already knew what Paul, Paul, Eric, Travis and Konrad felt about this before it started up). You can either post to the mailing list or email directly if you are the shy, retiring type. Perry |
From: Scott R. <ra...@ph...> - 2002-06-11 01:56:00
|
I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile: >>> arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> zeros(10, typecode='d') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required >>> zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote: > So one contentious issue a day isn't enough, huh? :-) > > > An issue that has been raised by scipy (most notably Eric Jones > > and Travis Oliphant) has been whether the default axis used by > > various functions should be changed from the current Numeric > > default. This message is not directed at determining whether we > > should change the current Numeric behavior for Numeric, but whether > > numarray should adopt the same behavior as the current Numeric. > > > > To be more specific, certain functions and methods, such as > > add.reduce(), operate by default on the first axis. For example, > > if x is a 2 x 10 array, then add.reduce(x) results in a > > 10 element array, where elements in the first dimension has > > been summed over rather than the most rapidly varying dimension. > > > > >>> x = arange(20) > > >>> x.shape = (2,10) > > >>> x > > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > > >>> add.reduce(x) > > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28]) > > The issue here is both consistency across a library and speed. > > >From the numpy.pdf, Numeric looks to have about 16 functions using > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > about 10 functions using axis=-1. To this day, I can't remember which > functions use which and have resorted to explicitly using axis=-1 in my > code. Unfortunately, many of the Numeric functions that should still > don't take axis as a keyword, so you and up just inserting -1 in the > argument list (but this is a different issue -- it just needs to be > fixed). > > SciPy always uses axis=-1 for operations. There are 60+ functions with > this convention. Choosing -1 offers the best cache use and therefore > should be more efficient. Defaulting to the fastest behavior is > convenient because new users don't need any special knowledge of > Numeric's implementation to get near peak performance. Also, there is > never a question about which axis is used for calculations. > > When using SciPy and Numeric, their function sets are completely > co-mingled. When adding SciPy and Numeric's function counts together, > it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a > standard, it is impossible for the interface to become intuitive because > of the exceptions to the rule from Numeric. > > So here what I think. All functions should default to the same axis so > that the interface to common functions can become second nature for new > users and experts alike. Further, the chosen axis should be the most > efficient for the most cases. > > There are actually a few functions that, taken in isolation, I think > should have axis=0. take() is an example. But, for the sake of > consistency, it too should use axis=-1. > > It has been suggested to recommend that new users always specify axis=? > as a keyword in functions that require an axis argument. This might be > fine when writing modules, but always having to type: > > >>> sum(a,axis=-1) > > in command line mode is a real pain. > > Just a point about the larger picture here... The changes we're > discussing are intended to clean up the warts on Numeric -- and, as good > as it is overall, these are warts in terms of usability. Interfaces > should be consistent across a library. The return types from functions > should be consistent regardless of input type (or shape). Default > arguments to the same keyword should also be consistent across > functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 > as default, returning arrays or scalars from Numeric functions and > indexing), but the choice made should be applied as consistently as > possible. > > We should also strive to make it as easy as possible to write generic > functions that work for all array types (Int, Float,Float32,Complex, > etc.) -- yet another debate to come. > > Changes are going to create some backward incompatibilities and that is > definitely a bummer. But some changes are also necessary before the > community gets big. I know the community is already reasonable size, > but I also believe, based on the strength of Python, Numeric, and > libraries such as Scientific and SciPy, the community can grow by 2 > orders of magnitude over the next five years. This kind of growth can't > occur if only savvy developers see the benefits of the elegant language. > It can only occur if the general scientist see Python as a compelling > alternative to Matlab (and IDL) as their day-in/day-out command line > environment for scientific/engineering analysis. Making the interface > consistent is one of several steps to making Python more attractive to > this community. > > Whether the changes made for numarray should be migrated back into > Numeric is an open question. I think they should, but see Konrad's > counterpoint. I'm willing for SciPy to be the intermediate step in the > migration between the two, but also think that is sub-optimal. > > > > > Some feel that is contrary to expectations that the least rapidly > > varying dimension should be operated on by default. There are > > good arguments for both sides. For example, Konrad Hinsen has > > argued that the current behavior is most compatible for behavior > > of other Python sequences. For example, > > > > >>> sum = 0 > > >>> for subarr in x: > > sum += subarr > > > > acts on the first axis in effect. Likewise > > > > >>> reduce(add, x) > > > > does likewise. In this sense, Numeric is currently more consistent > > with Python behavior. However, there are other functions that > > operate on the most rapidly varying dimension. Unfortunately > > I cannot currently access my old mail, but I think the rule > > that was proposed under this argument was that if the 'reduction' > > operation was of a structural kind, the first dimension is used. > > If the reduction or processing step is 'time-series' oriented > > (e.g., FFT, convolve) then the last dimension is the default. > > On the other hand, some feel it would be much simpler to understand > > if the last axis was the default always. > > > > The question is whether there is a consensus for one approach or > > the other. We raised this issue at a scientific Birds-of-a-Feather > > session at the last Python Conference. The sense I got there was > > that most were for the status quo, keeping the behavior as it is > > now. Is the same true here? In the absence of consensus or a > > convincing majority, we will keep the behavior the same for backward > > compatibility purposes. > > Obviously, I'm more opinionated about this now than I was then. I > really urge you to consider using axis=-1 everywhere. SciPy is not the > only scientific library, but I think it adds the most functions with a > similar signature (the stats module is full of them). I very much hope > for a consistent interface across all of Python's scientific functions > because command line users aren't going to care whether sum() and > kurtosis() come from different libraries, they just want them to behave > consistently. > > eric > > > > > Perry > > > > _______________________________________________________________ > > Don't miss the 2002 Sprint PCS Application Developer's Conference > August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink > > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ra...@ph... Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |
From: Travis O. <oli...@ie...> - 2002-06-11 03:51:25
|
On Mon, 2002-06-10 at 19:55, Scott Ransom wrote: > I have to admit that I agree with all of what Eric has to say > here -- even if it does cause some code breakage (I'm certainly > willing to do some maintenance on my code/modules that are > floating here and there so long as things continue to improve > with the language as a whole). I'm generally of the same opinion. > > I do think consistency is a very important aspect of getting > Numeric/Numarray accepted by a larger user base (and believe > me, my colaborators are probably sick of my Numeric Python > evangelism (but I like to think also a bit jealous of my NumPy > usage as they continue struggling with one-off C and Fortran > routines...)). > Another important factor is the support libraries. I know that something like Simulink (Matlab) is important to many of my colleagues in engineering. Simulink is the Mathworks version of visual programming which lets the user create a circuit visually which is then processed. I believe there was a good start to this sort of thing presented at the last Python Conference which was very encouraging. Other colleagues require something like a compiler to get C-code which will compile on a DSP board from a script and/or design session. I believe something like this would be very beneficial. > Another example of a glaring inconsistency in the current > implementation is this little number that has been bugging me > for awhile: > > >>> arange(10, typecode='d') > array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) > >>> ones(10, typecode='d') > array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) > >>> zeros(10, typecode='d') > Traceback (most recent call last): > File "<stdin>", line 1, in ? > TypeError: an integer is required > >>> zeros(10, 'd') > array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) > This is now fixed in cvs, along with other keyword problems. The ufunc methods reduce and accumulate also now take a keyword argument in CVS. -Travis |
From: Paul F D. <pa...@pf...> - 2002-06-11 03:56:12
|
I guess the argument for uniformity is pretty persuasive after all. (I know, I don't fit in on the Net, you can change my mind). Actually, don't we have a quick and dirty out here? Suppose we make the more uniform choice for Numarray, and then make a new module, say NumericCompatibility, which defines aliases to everything in Numarray that is the same as Numeric and then for the rest defines functions with the same names but the Numeric defaults, implemented by calling the ones in Numarray. Then changing "import Numeric" to "import NumericCompatibility as Numeric" ought to be enough to get someone working or close to working again. Someone posted something about "retrofitting" stuff from Numarray to Numeric. I cannot say strongly enough that I oppose this. Numeric itself must be frozen asap and eliminated eventually or there is no point to having developed a replacement that is easier to expand and maintain. We would have just doubled our workload for nothing. |