From: eric j. <er...@en...> - 2002-06-11 17:44:04
|
> "eric jones" <er...@en...> writes: > > > The issue here is both consistency across a library and speed. > > Consistency, fine. But not just within one package, also between > that package and the language it is implemented in. > > Speed, no. If I need a sum along the first axis, I won't replace > it by a sum across the last axis just because that is faster. The default axis choice influences how people choose to lay out their data in arrays. If the default is to sum down columns, then users lay out their data so that this is the order of computation. This results in strided operations. There are cases where you need to reduce over multiple data sets, etc. which is what the axis=? flag is for. But choosing the default to also be the most efficient just makes sense. The cost is even higher for wrappers around C libraries not written explicitly for Python (which is most of them), because you have to re-order the memory before passing the variables into the C loop. Of course, the axis=0 is faster for Fortran libraries with wrappers that are smart enough to recognize this (Pearu's f2py wrapped libraries now recognize this sort of thing). However, the marriage to C is more important as future growth will come in this area more than Fortran. > > > >From the numpy.pdf, Numeric looks to have about 16 functions using > > axis=0 (or index=0 which should really be axis=0) and, counting FFT, > > about 10 functions using axis=-1. To this day, I can't remember which > > If you weight by frequency of usage, the first group gains a lot in > importance. I just scanned through some of my code; almost all of the > calls to Numeric routines are to functions whose default axis > is zero. Right, but I think all the reduce operators (sum, product, etc.) should have been axis=-1 in the first place. > > > code. Unfortunately, many of the Numeric functions that should still > > don't take axis as a keyword, so you and up just inserting -1 in the > > That is certainly something that should be fixed, and I suppose no one > objects to that. Sounds like Travis already did it. Thanks. > > > My vote is for keeping axis defaults as they are, both because the > choices are reasonable (there was a long discussion about them in the > early days of NumPy, and the defaults were chosen based on other array > languages that had already been in use for years) and because any > change would cause most existing NumPy code to break in many places, > often giving wrong results instead of an error message. > > If a uniformization of the default is desired, I vote for axis=0, > for two reasons: > 1) Consistency with Python usage. I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency. I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative behavior that is generally better for array operations, as in the case of slices as views, is worth the change. > 2) Minimization of code breakage. Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case. Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for. > > > > We should also strive to make it as easy as possible to write generic > > functions that work for all array types (Int, Float,Float32,Complex, > > etc.) -- yet another debate to come. > > What needs to be improved in that area? Comparisons of complex numbers. But lets save that debate for later. > > > Changes are going to create some backward incompatibilities and that is > > definitely a bummer. But some changes are also necessary before the > > community gets big. I know the community is already reasonable size, > > I'd like to see evidence that changing the current NumPy behaviour > would increase the size of the community. It would first of all split > the current community, because many users (like myself) do not have > enough time to spare to go through their code line by line in order to > check for incompatibilities. That many others would switch to Python > if only some changes were made is merely an hypothesis. True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base. Even in communities where Python is relatively prevalent like astronomy, I would bet the every-day user base is less than 5% of the whole. There are a lot of holes to fill (graphics, comprehensive libraries, etc.) before we get up to the capabilities and quality of user interface that these tools have. Some of the interfaces problems are GUI and debugger related. Others are API related. Inconsistency in a library interface makes it harder to learn and is a wart. Whether it is as important as a graphics library? Probably not. But while we're building the next generation tool, we should fix things that make people wonder "why did they do this?". It is rarely a single thing that makes all the difference to a prospective user switching over. It is the overall quality of the tool that will sway them. > > > > Some feel that is contrary to expectations that the least rapidly > > > varying dimension should be operated on by default. There are > > > good arguments for both sides. For example, Konrad Hinsen has > > Actually the argument is not for the least rapidly varying > dimension, but for the first dimension. The internal data layout > is not significant for most Python array operations. We might > for example offer a choice of C style and Fortran style data layout, > enabling users to choose according to speed, compatibility, or > just personal preference. In a way, as Pearu has shown in f2py, this is already possible by jiggering the stride and dimension entries, so this doesn't even require a change to the array descriptor (I don't think...). We could supply functions that returned a Fortran layout array. This would be beneficial for some applications outside of what we're discussing now that use Fortran extensions heavily. As long as it is transparent to the extension writer (which I think it can be) it sounds fine. I think the default constructor should return a C layout array though, and will be what 99% of the users will use. eric > > Konrad. > -- > ------------------------------------------------------------------------ -- > ----- > Konrad Hinsen | E-Mail: hi...@cn... > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 > Rue Charles Sadron | Fax: +33-2.38.63.15.17 > 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ > France | Nederlands/Francais > ------------------------------------------------------------------------ -- > ----- |