numpy-discussion Mailing List for Numerical Python (Page 442)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Perry Greenfield wrote:

> > One major comment that isn't directly addressed on the web page is the
> > ease of writing new functions, I suppose Ufuncs, although I don't
> > usually care if they work on anything other than Arrays. I hope the new
> > system will make it easier to write new ones. 
<snip>

> Absolutely. We will provide examples of how to write new ufuncs. It should
> be very simple in one sense (requiring few lines of code) if our code
> generator machinery is used (but context is important here so this
> is why examples or a template is extremely important). But it isn't
> particularly hard to do without the code generator. And such ufuncs
> will handle *all* the generality of arrays including slices, non-aligned
> arrays, byteswapped arrays, and type conversion. I'd like to provide
> examples of writing ufuncs within a few weeks (along with examples
> of other kinds of functions using the C-API as well).

This sounds great! The code generting machinery sound very promising,
and examples are, of course, key. I found digging through the NumPy
source to figure out how to do things very treacherous. Making writing
Ufuncs easy will enocourage a lot more C Ufuncs to be written which
should help perfomance.

> > Also, I can't help wondering if this could leverage more existing code.
> > The blitz++ package being used by Eric Jones in the SciPy.compiler
> > project looks very promising. It's probably too late, but I'm wondering
> > what the reasons are for re-inventing such a general purpose wheel.
> >
> I'm not sure which "wheel" you are talking about :-)

The wheel I'm talking about are multi-dimensional array objects...

> We certainly
> aren't trying to replilcate what Eric Jones has done with the
> SciPy.compiler approach (which is very interesting in its own right).

I know, I just think using an existing set of C++ classes for multiple
typed multidimansional arrays would make sense, although I imagine it is
too late now!

> If the issue is why we are redoing Numeric:

Actually, I think I had a pretty good idea why you were working on this.

> 1) it has to be rewritten to be acceptable to Guido before it can be
>    part of the Standard Library.
> 2) to add new types (e.g. unsigned) and representations (e.g., non-aligned,
>    byteswapped, odd strides, etc). Using memory mapped data requires some
>    of these.
> 3) to make it more memory efficient with large arrays.
> 4) to make it more generally extensible

I'm particualry excited about 1) and 4)

> > As a whole I have found that I would like the transition from Python to
> > Compiled laguages to be smoother. The standard answer to Python
> > perfomance is to profile, and then re-write the computationally intesive
> > pertions in C. This would be a whole lot easier if Python used datatypes
> > that are easy to use from C/C++ as well as Python. I hope NumPy2 can
> > move in this direction.
> >
> What do you see as missing in numarray in that sense? Aside from UInt32
> I'm not aware of any missing type that is available on all platforms.
> There is the issue of Float128 and such. Adding these is not hard.
> The real issue is how to deal with the platforms that don't support them.

I used Poor wording. When I wrote "datatypes", I meant data types in a
much higher order sense. Perhaps structures or classes would be a better
term. What I mean is that is should be easy to use an manipulate the
same multidimensional arrays from both Python and C/C++. In the current
Numeric, most folks generate a contiguous array, and then just use the
array->data pointer to get what is essentially a C array. That's fine if
you are using it in a traditional C way, with fixed dimension, one
datatype, etc. What I'm imagining is having an object in C or C++ that
could be easily used as a multidimentional array. I'm thinking C++ would
probably neccesary, and probably templates as well, which is why blitz++
looked promising. Of course, blitz++ only compiles with a few up-to-date
compilers, so you'd never get it into the standard library that way!

This could also lead the way to being able to compile NumPy code....<end
fantasy>

> I think it is pretty easy to install since it use distutils.

I agree, but from the newsgroup, it is clear that a lot of folks are
very reluctant to use something that is not part of the standard
library.

> > >    We estimate
> > >    that numarray is probably another order of magnitude worse,
> > >    i.e., that 20K element arrays are at half the asymptotic
> > >    speed. How much should this be improved?
> >
> > A lot. I use arrays smaller than that most of the time!
> >
> What is good enough? As fast as current Numeric?

As fast as current Numeric would be "good enough" for me. It would be a
shame to go backwards in performance!

> (IDL does much
> better than that for example).

My personal benchmark is MATLAB, which I imagine is similar to IDL in
performance.

> 10 element arrays will never be
> close to C speed in any array based language embedded in an
> interpreted environment.

Well, sure, I'm not expecting that

> 100, maybe, but will be very hard.
> 1000 should be possible with some work.

I suppose MATLAB has it easier, as all arrays are doubles, and, (untill
recently anyway), all variable where arrays, and all arrays were 2-d.
NumPy is a lot more flexible that that. Is is the type and size checking
that takes the time?

> Another approach is to try to cast many of the functions as being
> able to broadcast over repeated small arrays. After all, if one
> is only doing a computation on one small array, it seems unlikely
> that the overhead of Python will be objectionable. Only if you
> have many such arrays to repeat calculations on, should it be
> a problem (or am I wrong about that).

You are probably right about that.

> If these repeated calculations
> can be "assembled"  into a higher dimensionality array (which
> I understand isn't always possible) and operated on in that sense,
> the efficiency issue can be dealt with.

I do that when possible, but it's not always possible.

> But I guess this can only
> be seen with specific existing examples and programs. I would
> be interested in seeing the kinds of applications you have now
> to gauge what the most effective solution would be.

One of the things I do a lot with are coordinates of points and
polygons. Sets if points I can handle easily as an NX2 array, but
polygons don't work so well, as each polgon has a different number of
points, so I use a list of arrays, which I have to loop over. Each
polygon can have from about 10 to thousands of points (mostly 10-20,
however). One way I have dealt with this is to store a polygon set as a
large array of all the points, and another array with the indexes of the
start and end of each polygon. That way I can transform the coordinates
of all the polygons in one operation. It works OK, but sometimes it is
more useful to have them in a sequence. 

> As mentioned,
> we tend to deal with large data sets and so I don't think we have
> a lot of such examples ourselves.

I know large datasets were one of your driving factors, but I really
don't want to make performance on smaller datasets secondary.

I hope I'll get a chance to play with it soon....

-Chris

-- 
Christopher Barker,
Ph.D.                                                           
Chr...@ho...                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------

2000	Jan (8)	Feb (49)	Mar (48)	Apr (28)	May (37)	Jun (28)	Jul (16)	Aug (16)	Sep (44)	Oct (61)	Nov (31)	Dec (24)
2001	Jan (56)	Feb (54)	Mar (41)	Apr (71)	May (48)	Jun (32)	Jul (53)	Aug (91)	Sep (56)	Oct (33)	Nov (81)	Dec (54)
2002	Jan (72)	Feb (37)	Mar (126)	Apr (62)	May (34)	Jun (124)	Jul (36)	Aug (34)	Sep (60)	Oct (37)	Nov (23)	Dec (104)
2003	Jan (110)	Feb (73)	Mar (42)	Apr (8)	May (76)	Jun (14)	Jul (52)	Aug (26)	Sep (108)	Oct (82)	Nov (89)	Dec (94)
2004	Jan (117)	Feb (86)	Mar (75)	Apr (55)	May (75)	Jun (160)	Jul (152)	Aug (86)	Sep (75)	Oct (134)	Nov (62)	Dec (60)
2005	Jan (187)	Feb (318)	Mar (296)	Apr (205)	May (84)	Jun (63)	Jul (122)	Aug (59)	Sep (66)	Oct (148)	Nov (120)	Dec (70)
2006	Jan (460)	Feb (683)	Mar (589)	Apr (559)	May (445)	Jun (712)	Jul (815)	Aug (663)	Sep (559)	Oct (930)	Nov (373)	Dec

numpy-discussion Mailing List for Numerical Python (Page 442)

A package for scientific computing with Python

numpy-discussion — Discussion list for all users of Numerical Python