From: Travis O. <oli...@ee...> - 2004-01-19 21:33:42
|
Numarray is making great progress and is quite usable for many purposes. An idea that was championed by some is that the Numeric code base would stay static and be replaced entirely by Numarray. However, Numeric is currently used in a large installed base. In particular SciPy uses Numeric as its core array. While no doubt numarray arrays will be supported in the future, the speed of the less bulky Numeric arrays and the typical case that we encounter in SciPy of many, small arrays will make it difficult for people to abandon Numeric entirely with it's comparatively light-weight arrays. In the development of SciPy we have encountered issues in Numeric that we feel need to be fixed. As this has become an important path to success of several projects (both commercial and open) it is absolutely necessary that this issues be addressed. The purpose of this email is to assess the attitude of the community regarding how these changes to Numeric should be accomplished. These are the two options we can see: * freeze old Numeric 23.x and make all changes to Numeric 24.x still keeping Numeric separate from SciPy * freeze old Numeric 23.x and subsume Numeric into SciPy essentially creating a new SciPy arrayobject that is fast and lightweight. Anybody wanting this new array object would get it by installing scipy_base. Numeric would never change in the future but the array in scipy_base would. It is not an option to wait for numarray to get fast enough as these issues need to be addressed now. Ultimately I think it will be a wise thing to have two implementations of arrays: one that is fast and lightweight optimized for many relatively small arrays, and another that is optimized for large-scale arrays. Eventually, the use of these two underlying implementations should be automatic and invisible to the user. A few of the particular changes we need to make to the Numeric arrayobject are: 1) change the coercion model to reflect Numarray's choice and eliminate the savespace crutch. 2) Add indexing capability to Numeric arrays (similar to Numarray's) 3) Improve the interaction between Numeric arrays and scalars. 4) Optimization: Again, these changes are going to be made to some form of the Numeric arrays. What I am really interested in knowing is the attitude of the community towards keeping Numeric around. If most of the community wants to see Numeric go away then we will be forced to bring the Numeric array under the SciPy code-base and own it there. Your feedback is welcome and appreciated. Sincerely, Travis Oliphant and other SciPy developers |
From: Perry G. <pe...@st...> - 2004-01-19 22:13:54
|
Travis Oliphant writes: > > Numarray is making great progress and is quite usable for many > purposes. An idea that was championed by some is that the Numeric code > base would stay static and be replaced entirely by Numarray. > > However, Numeric is currently used in a large installed base. In > particular SciPy uses Numeric as its core array. While no doubt > numarray arrays will be supported in the future, the speed of the less > bulky Numeric arrays and the typical case that we encounter in SciPy of > many, small arrays will make it difficult for people to abandon Numeric > entirely with it's comparatively light-weight arrays. > I'd like to ask if the numarray option couldn't at least be considered. In particular with regard to speed, we'd like to know what the necessary threshold is. For many ufuncs, numarray is within a factor of 3 or so of Numeric for small arrays. Is this good enough or not? What would be good enough? It would probably be difficult to make it as fast in all cases, but how close does it have to be? A factor of 2? 1.5? We haven't gotten very much feedback on specific numbers in this regard. Are there other aspects of numarray performance that are a problem? What specifically? We don't have the resources to optimize everything in case it might affect someone. We need to know that it is particular problem with users to give it some priority (and know what the necessary threshold is for acceptable performance). Perhaps the two (Numeric and numarray) may need to coexist for a while, but we would like to isolate the issues that make that necessary. That hasn't really happened yet. Travis, do you have any specific nummarray speed issues that have arisen from your benchmarking or use that we can look at? Perry Greenfield |
From: Tim H. <tim...@ie...> - 2004-01-21 23:23:01
|
Arthur wrote: [SNIP] > Which, to me, seems like a worthy goal. > > On the other hand, it would seem that the goal of something to move > into the core would be performance optimized at the range of array > size most commonly encountered. Rather than for the extraodrinary, > which seems to be the goal of numarray, responding to specific needs > of the numarray development team's applications. I'm not sure where you came up with this, but it's wrong on at least two counts. The first is that last I heard the crossover point where Numarray becomes faster than Numeric is about 2000 elements. It would be nice if that becomes smaller, but I certainly wouldn't call it extreme. In fact I'd venture that the majority of cases where numeric operations are a bottleneck would already be faster under Numarray. In my experience, while it's not uncommon to use short arrays, it is rare for them to be a bottleneck. The second point is the relative speediness of Numeric at low array sizes is the result that nearly all of it is implemented in C, whereas much of Numarray is implemented in Python. This results in a larger overhead for Numarray, which is why it's slower for small arrays. As I understand it, the decision to base most of Numarray in Python was driven by maintainability; it wasn't an attempt to optimize large arrays at the expense of small ones. > Has the core Python development team given out clues about their > feelings/requirements for a move of either Numeric or numarray into > the core? I believe that one major requirement was that the numeric community come to a consensus on an array package and be willing to support it in the core. There may be other stuff. > It concerns me that this thread isn't trafficked. I suspect that most of the exchange has taken place on num...@li.... [SNIP] -tim |
From: Robert K. <rk...@uc...> - 2004-01-21 23:41:07
|
On Wed, Jan 21, 2004 at 04:22:43PM -0700, Tim Hochberg wrote: [snip] > The second point is the relative speediness of Numeric at low array > sizes is the result that nearly all of it is implemented in C, whereas > much of Numarray is implemented in Python. This results in a larger > overhead for Numarray, which is why it's slower for small arrays. As I > understand it, the decision to base most of Numarray in Python was > driven by maintainability; it wasn't an attempt to optimize large arrays > at the expense of small ones. Has the numarray team (or anyone else for that matter) looked at using Pyrex[1] to implement any part of numarray? If not, then that's my next free-time experiment (i.e. avoiding homework while still looking productive at the office). [1] http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ -- Robert Kern rk...@uc... "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter |
From: Perry G. <pe...@st...> - 2004-01-23 02:49:54
|
Robert Kern writes: > [snip] > > Tim Hochberg writes: > > The second point is the relative speediness of Numeric at low array > > sizes is the result that nearly all of it is implemented in C, whereas > > much of Numarray is implemented in Python. This results in a larger > > overhead for Numarray, which is why it's slower for small arrays. As I > > understand it, the decision to base most of Numarray in Python was > > driven by maintainability; it wasn't an attempt to optimize > large arrays > > at the expense of small ones. > > Has the numarray team (or anyone else for that matter) looked at using > Pyrex[1] to implement any part of numarray? If not, then that's my next > free-time experiment (i.e. avoiding homework while still looking > productive at the office). > We had looked at it at least a couple of times. I don't remember now all the conclusions, but I think one of the problems was that it wasn't as useful when one had to deal with data types not used in python itself (e.g., unsigned int16). I might be wrong about that. Numarray generates a lot of c code directly for the actual array computations. That is neither the slow part, nor the hard part to write. It is the array computation setup that is complicated. Much of that is now in C (and we do worry that it has greatly added to the complexity). Perhaps that part could be better handled by pyrex. I think some of the remaining overhead has to do with intrinsic python calls, and the differences between the simpler type used for Numeric versus the new style classes used for numarray. Don't hold me to that however. Perry |
From: Paul P. <pa...@pr...> - 2004-01-23 07:30:35
|
Perry Greenfield wrote: > ... > We had looked at it at least a couple of times. I don't remember now > all the conclusions, but I think one of the problems was that > it wasn't as useful when one had to deal with data types not > used in python itself (e.g., unsigned int16). I might be wrong > about that. I would guess that the issue is more whether it is natively handled by Pyrex than whether it is handled by Python. Is there a finite list of these types that Numarray handles? If you have a list I could generate a patch to Pyrex that would support them. We could then ask Greg whether he could add them to Pyrex core or refactor it so that he doesn't have to. > Numarray generates a lot of c code directly for the actual > array computations. That is neither the slow part, nor the > hard part to write. It is the array computation setup that > is complicated. Much of that is now in C (and we do worry > that it has greatly added to the complexity). Perhaps that > part could be better handled by pyrex. It sounds like it. > I think some of the remaining overhead has to do with intrinsic > python calls, and the differences between the simpler type used > for Numeric versus the new style classes used for numarray. > Don't hold me to that however. Pyrex may be able to help with at least one of these. Calls between Pyrex-coded functions usually go at C speeds (although method calls may be slower). I don't know enough about the new-style, old-style issue to know about whether Pyrex can help with that but I would guess it might because a Pyrex "extension type" is more like a C extension type than a Python instance object. That implies some faster method lookup and calling. Numeric is the exact type of project Pyrex is designed for. And of course it works seamlessly with pre-existing Python and C code so you can selectively port things. Paul Prescod |
From: Francesc A. <fa...@op...> - 2004-01-23 09:39:08
|
A Divendres 23 Gener 2004 08:24, Paul Prescod va escriure: > Perry Greenfield wrote: > > ... > > We had looked at it at least a couple of times. I don't remember now > > all the conclusions, but I think one of the problems was that > > it wasn't as useful when one had to deal with data types not > > used in python itself (e.g., unsigned int16). I might be wrong > > about that. > > I would guess that the issue is more whether it is natively handled by > Pyrex than whether it is handled by Python. Is there a finite list of > these types that Numarray handles? If you have a list I could generate a > patch to Pyrex that would support them. We could then ask Greg whether > he could add them to Pyrex core or refactor it so that he doesn't have to. I think the question rather was whether Pyrex would be able to work with templates (in the sense of C++), i.e. it can generate different functions depending on the datatypes passed to them. You can see some previous discussion on that list in: http://sourceforge.net/mailarchive/forum.php?thread_id=3D1642778&forum_id= =3D4890 I've formulated the question to Greg and here you are his answer: http://sourceforge.net/mailarchive/forum.php?thread_id=3D1645713&forum_id= =3D4890 So, it seems that he don't liked the idea to implement "templates" in Pyrex. > > > Numarray generates a lot of c code directly for the actual > > array computations. That is neither the slow part, nor the > > hard part to write. It is the array computation setup that > > is complicated. Much of that is now in C (and we do worry > > that it has greatly added to the complexity). Perhaps that > > part could be better handled by pyrex. > > It sounds like it. Yeah, I'm quite convinced that a mix between Pyrex and the existing solution in numarray for dealing with templates could be worth the effort. At least, some analysis could be done on that aspect. > > > I think some of the remaining overhead has to do with intrinsic > > python calls, and the differences between the simpler type used > > for Numeric versus the new style classes used for numarray. > > Don't hold me to that however. > > Pyrex may be able to help with at least one of these. Calls between > Pyrex-coded functions usually go at C speeds (although method calls may > be slower). Well, that should be clarified: that's only true for cdef's pyrex functions (i.e. C functions made in Pyrex). Pyrex functions that are able to be called from Python takes the same time whether they are called from Python or from the same Pyrex extension. See some timmings I've done on that subject some time ago: http://sourceforge.net/mailarchive/message.php?msg_id=3D3782230 Cheers, =2D-=20 =46rancesc Alted Departament de Ci=E8ncies Experimentals Universitat Jaume I. Castell=F3 de la Plana. Spain |
From: Chris B. <Chr...@no...> - 2004-01-26 17:51:12
|
I remember that thread clearly, as I think making it easy to write new Ufuncs (and others) that perform at C speed could make a real difference to how effective SciPy ultimately is. I say SciPy, because I believe a large collection of special purpose optimized functions probably doesn't belong in in Numarray itself. Francesc Alted wrote: > So, it seems that he don't liked the idea to implement "templates" in Pyrex. Yes, I remember that answer, and was disappointed, though the logic of not-re-implkimenting C++ templates is pretty obvious. Which brings up the obvious question: why not use C++ templates themselves? which is what Blitz does. This points ot weave.blitz at the obvious way to write optimized special purpose functions for SciPy. Does weave.Blitz work with Numarray yet? Clearly it's time for me to check it out more... > Yeah, I'm quite convinced that a mix between Pyrex and the existing solution > in numarray for dealing with templates could be worth the effort. At least, > some analysis could be done on that aspect. allowing Pyrex to use templates would be great.. but how would that be better than weave.blitz? Or maybe Pyrex could use blitz. I'm kind of over my head here, but I hope something comes of this. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Paul P. <pa...@pr...> - 2004-01-22 04:21:11
|
Tim Hochberg wrote: >... > > The second point is the relative speediness of Numeric at low array > sizes is the result that nearly all of it is implemented in C, whereas > much of Numarray is implemented in Python. This results in a larger > overhead for Numarray, which is why it's slower for small arrays. As I > understand it, the decision to base most of Numarray in Python was > driven by maintainability; it wasn't an attempt to optimize large arrays > at the expense of small ones. What about Pyrex? If you code Pyrex as if it were exactly Python you won't get much optimization. But if you code it as if it were 90% as maintainable as Python you can often get 90% of the speed of C, which is pretty damn close to having all of the best of both worlds. If you point me to a few key functions in Numarray I could try to recode them in Pyrex and do some benchmarking for you (only if Pyrex is a serious option of course!). Paul Prescod |
From: Konrad H. <hi...@cn...> - 2004-01-20 11:15:45
|
On 19.01.2004, at 21:32, Travis Oliphant wrote: > These are the two options we can see: > * freeze old Numeric 23.x and make all changes to Numeric 24.x still=20 > keeping Numeric separate from SciPy > * freeze old Numeric 23.x and subsume Numeric into SciPy essentially=20 > creating a new SciPy arrayobject that is fast and lightweight. =20 > Anybody wanting this new array object would get it by installing=20 > scipy_base. Numeric would never change in the future but the array in=20 > scipy_base would. That depends on the exact nature of the changes. My view is that any=20 package that is upwards-compatible with Numeric (except for bug fixes=20 of course) should be called Numeric and distributed as such. Any=20 package that is intentionally incompatible with Numeric in some=20 important aspect should not be called Numeric. There is a lot of code=20 out there that builds on Numeric, and some of it is hardly maintained=20 any more, although there are still users around. Those users expect to=20 be able to upgrade Numeric without breaking their code. Konrad. |
From: Chris B. <Chr...@no...> - 2004-01-20 19:12:48
|
Konrad Hinsen wrote: > My view is that any > package that is upwards-compatible with Numeric (except for bug fixes > of course) should be called Numeric and distributed as such. Any > package that is intentionally incompatible with Numeric in some > important aspect should not be called Numeric. I absolutely agree with this. Travis Oliphant wrote: > 1) change the coercion model to reflect Numarray's choice and eliminate > the savespace crutch. > 2) Add indexing capability to Numeric arrays (similar to Numarray's) > 3) Improve the interaction between Numeric arrays and scalars. These all look like backward in-compatable changes, so in that case, I vote for Sci-py-array, or whatever. However, it also looks like these are all moving toward the Numarray API. Is this the case? That would be great, as then Numarray would just be dropped in if/when it is deemed up to the task. It also leaves the door open for some sort of automagic selection of which array to use for a given instance. > 4) Optimization: Nothing wrong with that...as long as it's not premature! > Numarray is making great progress and is quite usable for many > purposes. An idea that was championed by some is that the Numeric code > base would stay static and be replaced entirely by Numarray. > However, Numeric is currently used in a large installed base. In > particular SciPy uses Numeric as its core array. While no doubt > numarray arrays will be supported in the future, the speed of the less > bulky Numeric arrays and the typical case that we encounter in SciPy of > many, small arrays will make it difficult for people to abandon Numeric > entirely with it's comparatively light-weight arrays. It was said that making Numarray more efficient with small arrays was a goal of the project...is it still? I'm still unclear on why Numarrays are so much more "heavy"..is it just that no one has taken the time to optimize them, or is there really something inherent (and important) in the design? > As this has become an important path to > success of several projects (both commercial and open) it is absolutely > necessary that this issues be addressed. From the sammll list above, it looks like what you need is an array that is like a Numarray, but faster for samll arrays...Has anyone done an analysis of whether it would be harder to optimize Numarray than to make the above changes to Numeric, and continue to maintain two packages? You probably have, but I though I'd ask anyway... > Ultimately I think it will be a wise > thing to have two implementations of arrays: one that is fast and > lightweight optimized for many relatively small arrays, and another that > is optimized for large-scale arrays. Are these really incompatable goals? > If most of the community > wants to see Numeric go away then we will be forced to bring the > Numeric array under the SciPy code-base and own it there. I think it's quite the opposite... if most of the community wants to see Numeric continue on, it must be maintained (and improved) with little change to the API. If we're all going to switch to Numarray, then the SciPy project can do whatever it wants with Numeric... In Summary: - Anything called "Numeric" should have a compatable API to the current version - I'd much rather have just one N-d array type, preferable one that is part of the Python Standard Library...is likely to ever happen? - I also want fast small arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Francesc A. <fa...@op...> - 2004-01-20 20:29:34
|
A Dimarts 20 Gener 2004 20:11, Chris Barker va escriure: > > As this has become an important path to > > success of several projects (both commercial and open) it is absolutely > > necessary that this issues be addressed. > > From the sammll list above, it looks like what you need is an array > that is like a Numarray, but faster for samll arrays...Has anyone done > an analysis of whether it would be harder to optimize Numarray than to > make the above changes to Numeric, and continue to maintain two > packages? You probably have, but I though I'd ask anyway... I agree. An analysis should be done in order to see if it is better to concentrate in getting numarray better for small arrays or in having several array implementations. The problem is if numarray cannot be enhanced enough because of design problems, although I would bet that something can be done in order to get it close to Numeric performance. And I guess quite a bit people on this list would be happy to collaborate in some way or another so as to achieve this goal. However, as Perry says, in order to do this analysis, an amount of the needed speed-up should be estimated first. I personaly feel that it would worth the effort to go and try to optimize the small arrays case in numarray instead of having to fight against a jungle of Numeric/numarray/python array implementations. I strongly believe that numarray has enough advantages over Numeric that would compensate the effort to further enhance its present limitations rather than maintain several packages. Just my 2 cents, -- Francesc Alted |
From: Colin J. W. <cj...@sy...> - 2004-01-20 22:19:14
|
Travis Oliphant wrote: > > Numarray is making great progress and is quite usable for many > purposes. An idea that was championed by some is that the Numeric > code base would stay static and be replaced entirely by Numarray. It was my impression that this idea had been generally accepted. It was not just one of the proposals under discussion. I wonder how many others out there had assumed that, in spite of current speed problems, numarray was the way for the future, and had based their development endeavours on numarray. I did. To this relative outsider, there seem to have been three groups involved in efforts to provide Python with numerical array capabilities, those connected with Numeric, SciPy and numarray. SciPy would appear to be the most recent addition to the list. Is there any way that some agrement between these groups can be achieved to restore the hope for a common development path? This message from Travis Oliphant seems to envisage two paths. Is this the better way to go? > > However, Numeric is currently used in a large installed base. In > particular SciPy uses Numeric as its core array. While no doubt > numarray arrays will be supported in the future, the speed of the less > bulky Numeric arrays and the typical case that we encounter in SciPy > of many, small arrays will make it difficult for people to abandon > Numeric entirely with it's comparatively light-weight arrays. > > In the development of SciPy we have encountered issues in Numeric that > we feel need to be fixed. As this has become an important path to > success of several projects (both commercial and open) it is > absolutely necessary that this issues be addressed. > > > The purpose of this email is to assess the attitude of the community > regarding how these changes to Numeric should be accomplished. > These are the two options we can see: > * freeze old Numeric 23.x and make all changes to Numeric 24.x still > keeping Numeric separate from SciPy > * freeze old Numeric 23.x and subsume Numeric into SciPy essentially > creating a new SciPy arrayobject that is fast and lightweight. > Anybody wanting this new array object would get it by installing > scipy_base. Numeric would never change in the future but the array in > scipy_base would. > > It is not an option to wait for numarray to get fast enough as these > issues need to be addressed now. Ultimately I think it will be a wise > thing to have two implementations of arrays: one that is fast and > lightweight optimized for many relatively small arrays, and another > that is optimized for large-scale arrays. Eventually, the use of > these two underlying implementations should be automatic and invisible > to the user. Is this "automatic and invisible" practicable, excepts for trivial examples? > > A few of the particular changes we need to make to the Numeric > arrayobject are: > > 1) change the coercion model to reflect Numarray's choice and > eliminate the savespace crutch. > 2) Add indexing capability to Numeric arrays (similar to Numarray's) > 3) Improve the interaction between Numeric arrays and scalars. > 4) Optimization: > > Again, these changes are going to be made to some form of the Numeric > arrays. What I am really interested in knowing is the attitude of the > community towards keeping Numeric around. If most of the community > wants to see Numeric go away then we will be forced to bring the > Numeric array under the SciPy code-base and own it there. > > Your feedback is welcome and appreciated. > Sincerely, > > Travis Oliphant and other SciPy developers > > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion I hope that some cooperative approach can be devised. Colin W. |
From: Perry G. <pe...@st...> - 2004-01-21 01:31:48
|
On Tuesday, January 20, 2004, at 05:18 PM, Colin J. Williams wrote: > Travis Oliphant wrote: > >> >> Numarray is making great progress and is quite usable for many >> purposes. An idea that was championed by some is that the Numeric >> code base would stay static and be replaced entirely by Numarray. > > It was my impression that this idea had been generally accepted. It > was not just one of the proposals under discussion. > I don't think there was ever any formal vote. I think Paul Dubois had accepted the idea, others had a more "wait and see" attitude. Realistically, I think one can safely say that as one might expect, those that already were using Numeric probably were happy with its capabilities and that given normal motivations, there would be significant inertia on the part of well established users (those with a lot of code already) to switch over. But since it wasn't quite as usable for our needs, we decided that we needed a new version. We had to develop it to support our needs and would have done it regardless. We hoped that it would be suitable for all uses, and we've tried to involve all in the process as much as possible. As you might expect, we've devoted most of our attention to meeting our needs, but we have also expended significant energy trying to meet the needs of the more general community (and we will continue to try to do so within our resources). I don't know if it is reasonable to expect that a certain outcome has been blessed by all, nor did most of the existing Numeric users ask us to do this. But many did recognize (as Paul Dubois alluded to) that there was a need to recode the array stuff. Maybe someone could have done a better job of it, but no one else has yet (it is a fair amount of work after all). We do intend to support all the important packages that Numeric does, it make take some time to get there. I suppose our goal is to eventually attract all new users. We can't, nor should we expect that existing Numeric users will switch at our desire or whim. > I wonder how many others out there had assumed that, in spite of > current speed problems, numarray was the way for the future, and had > based their development endeavours on numarray. I did. > > To this relative outsider, there seem to have been three groups > involved in efforts to provide Python with numerical array > capabilities, those connected with Numeric, SciPy and numarray. SciPy > would appear to be the most recent addition to the list. > Actually, I think it would be more accurate to say that SciPy is an attempt to collect a large base of numeric code and integrate it into an array package (currently Numeric) rather than to develop a new array package. It was started before we started numarray and thus was centered around Numeric. They have found occasions to to modify and extend Numeric behavior. In that sense, it long has been somewhat incompatible with Numeric. (Travis can correct me if I got that wrong.) > Is there any way that some agrement between these groups can be > achieved to restore the hope for a common development path? > I would certainly like to, and in any case, we want to adapt scipy to be compatible with numarray. Perry Greenfield |
From: Andrew P. L. Jr. <bs...@al...> - 2004-01-21 02:52:17
|
On Mon, 19 Jan 2004, Travis Oliphant wrote: > ... Ultimately I think it will be a wise thing to have two > implementations of arrays: one that is fast and lightweight optimized > for many relatively small arrays, and another that is optimized for > large-scale arrays. I am *extremely* interested in the use case of the small arrays in SciPy. Which algorithms and modules are dominated by the small array speed? -a |