From: Chris B. <chr...@ho...> - 2001-11-20 23:22:21
|
Perry Greenfield wrote: > > One major comment that isn't directly addressed on the web page is the > > ease of writing new functions, I suppose Ufuncs, although I don't > > usually care if they work on anything other than Arrays. I hope the new > > system will make it easier to write new ones. <snip> > Absolutely. We will provide examples of how to write new ufuncs. It should > be very simple in one sense (requiring few lines of code) if our code > generator machinery is used (but context is important here so this > is why examples or a template is extremely important). But it isn't > particularly hard to do without the code generator. And such ufuncs > will handle *all* the generality of arrays including slices, non-aligned > arrays, byteswapped arrays, and type conversion. I'd like to provide > examples of writing ufuncs within a few weeks (along with examples > of other kinds of functions using the C-API as well). This sounds great! The code generting machinery sound very promising, and examples are, of course, key. I found digging through the NumPy source to figure out how to do things very treacherous. Making writing Ufuncs easy will enocourage a lot more C Ufuncs to be written which should help perfomance. > > Also, I can't help wondering if this could leverage more existing code. > > The blitz++ package being used by Eric Jones in the SciPy.compiler > > project looks very promising. It's probably too late, but I'm wondering > > what the reasons are for re-inventing such a general purpose wheel. > > > I'm not sure which "wheel" you are talking about :-) The wheel I'm talking about are multi-dimensional array objects... > We certainly > aren't trying to replilcate what Eric Jones has done with the > SciPy.compiler approach (which is very interesting in its own right). I know, I just think using an existing set of C++ classes for multiple typed multidimansional arrays would make sense, although I imagine it is too late now! > If the issue is why we are redoing Numeric: Actually, I think I had a pretty good idea why you were working on this. > 1) it has to be rewritten to be acceptable to Guido before it can be > part of the Standard Library. > 2) to add new types (e.g. unsigned) and representations (e.g., non-aligned, > byteswapped, odd strides, etc). Using memory mapped data requires some > of these. > 3) to make it more memory efficient with large arrays. > 4) to make it more generally extensible I'm particualry excited about 1) and 4) > > As a whole I have found that I would like the transition from Python to > > Compiled laguages to be smoother. The standard answer to Python > > perfomance is to profile, and then re-write the computationally intesive > > pertions in C. This would be a whole lot easier if Python used datatypes > > that are easy to use from C/C++ as well as Python. I hope NumPy2 can > > move in this direction. > > > What do you see as missing in numarray in that sense? Aside from UInt32 > I'm not aware of any missing type that is available on all platforms. > There is the issue of Float128 and such. Adding these is not hard. > The real issue is how to deal with the platforms that don't support them. I used Poor wording. When I wrote "datatypes", I meant data types in a much higher order sense. Perhaps structures or classes would be a better term. What I mean is that is should be easy to use an manipulate the same multidimensional arrays from both Python and C/C++. In the current Numeric, most folks generate a contiguous array, and then just use the array->data pointer to get what is essentially a C array. That's fine if you are using it in a traditional C way, with fixed dimension, one datatype, etc. What I'm imagining is having an object in C or C++ that could be easily used as a multidimentional array. I'm thinking C++ would probably neccesary, and probably templates as well, which is why blitz++ looked promising. Of course, blitz++ only compiles with a few up-to-date compilers, so you'd never get it into the standard library that way! This could also lead the way to being able to compile NumPy code....<end fantasy> > I think it is pretty easy to install since it use distutils. I agree, but from the newsgroup, it is clear that a lot of folks are very reluctant to use something that is not part of the standard library. > > > We estimate > > > that numarray is probably another order of magnitude worse, > > > i.e., that 20K element arrays are at half the asymptotic > > > speed. How much should this be improved? > > > > A lot. I use arrays smaller than that most of the time! > > > What is good enough? As fast as current Numeric? As fast as current Numeric would be "good enough" for me. It would be a shame to go backwards in performance! > (IDL does much > better than that for example). My personal benchmark is MATLAB, which I imagine is similar to IDL in performance. > 10 element arrays will never be > close to C speed in any array based language embedded in an > interpreted environment. Well, sure, I'm not expecting that > 100, maybe, but will be very hard. > 1000 should be possible with some work. I suppose MATLAB has it easier, as all arrays are doubles, and, (untill recently anyway), all variable where arrays, and all arrays were 2-d. NumPy is a lot more flexible that that. Is is the type and size checking that takes the time? > Another approach is to try to cast many of the functions as being > able to broadcast over repeated small arrays. After all, if one > is only doing a computation on one small array, it seems unlikely > that the overhead of Python will be objectionable. Only if you > have many such arrays to repeat calculations on, should it be > a problem (or am I wrong about that). You are probably right about that. > If these repeated calculations > can be "assembled" into a higher dimensionality array (which > I understand isn't always possible) and operated on in that sense, > the efficiency issue can be dealt with. I do that when possible, but it's not always possible. > But I guess this can only > be seen with specific existing examples and programs. I would > be interested in seeing the kinds of applications you have now > to gauge what the most effective solution would be. One of the things I do a lot with are coordinates of points and polygons. Sets if points I can handle easily as an NX2 array, but polygons don't work so well, as each polgon has a different number of points, so I use a list of arrays, which I have to loop over. Each polygon can have from about 10 to thousands of points (mostly 10-20, however). One way I have dealt with this is to store a polygon set as a large array of all the points, and another array with the indexes of the start and end of each polygon. That way I can transform the coordinates of all the polygons in one operation. It works OK, but sometimes it is more useful to have them in a sequence. > As mentioned, > we tend to deal with large data sets and so I don't think we have > a lot of such examples ourselves. I know large datasets were one of your driving factors, but I really don't want to make performance on smaller datasets secondary. I hope I'll get a chance to play with it soon.... -Chris -- Christopher Barker, Ph.D. Chr...@ho... --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ |