RE: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download
From: Perry G. <gre...@ho...> - 2001-11-18 00:27:05
|
> > I think that we also don't like that, and after doing the original, > > somewhat incomplete, implementation using the subarray approach, > > I began to feel that implementing it in C (albiet using a different > > approach for the code generation) was probably easier and more > > elegant than what was done here. So you are very likely to see > > it integrated as a regular numeric type, with a more C-based > > implementation. > > Sounds good. Is development going to take place on the CVS > tree. If so, I > could help out by comitting changes directly. > > > > > > 2) Also, in your C-API, you have a different pointer to the > > > imaginary data. > > > I much prefer the way it is done currently to have complex numbers > > > represented as an 8-byte, or 16-byte chunk of contiguous memory. > > > > Any reason not to allow both? (The pointer to the real can be > interpreted > > as either a pointer to 8-byte or 16-byte quantities). It is true > > that figuring out the imaginary pointer from the real is trivial > > so I suppose it really isn't necessary. > > I guess the way you've structured the ndarray, it is possible. I figured > some operations might be faster, but perhaps not if you have two pointers > running at the same time, anyway. > Well, the C implementation I was thinking of would only use one pointer. The API could supply both if some algorithms would find it useful to just access the imaginary data alone. But as mentioned, I don't think it is important to include, so we could easily get rid of it (and probably should) > > > > > Index Arrays: > > > =========== > > > > > > 1) For what it's worth, my initial reaction to your indexing > scheme is > > > negative. I would prefer that if > > > > > > a = [[1,2,3,4], > > > [5,6,7,8], > > > [9,10,11,12], > > > [13,14,15,16]] > > > > > > then > > > > > > a[[1,3],[0,3]] returns the sub-matrix: > > > > > > [[ 4, 6], > > > [ 12, 14] > > > > > > i.e. the cross-product of [1,3] x [0,3] This is the way MATLAB > > > works. I'm > > > not sure what IDL does. > > > > I'm afraid I don't understand the example. Could you elaborate > > a bit more how this is supposed to work? (Or is it possible > > there is an error? I would understand it if the result were > > [[5, 8],[13,16]] corresponding to the index pairs > > [[(1,0),(1,3)],[(3,0),(3,3)]]) > > > > The idea is to consider indexing with arrays of integers to be a > generalization of slice index notation. Simply interpret the > slice as an > array of integers that would be formed by using the range operator. > > For example, I would like to see > > a[1:5,1:3] be the same thing as a[[1,2,3,4],[1,2]] > > a[1:5,1:3] selects the 2-d subarray consisting of rows 1 to 4 and > columns 1 > to 2 (inclusive starting with the first row being row 0). In > other words, > the indices used to select the elements of a are ordered-pairs > taken from the > cross-product of the index set: > > [1,2,3,4] x [1,2] = [(1,1), (1,2), (2,1), (2,2), (3,1), (3,2), > (4,1), (4,2)] > and these selected elements are structured as a 2-d array of shape (4,2) > > Does this make more sense? Indexing would be a natural extension of this > behavior but allowing sets that can't be necessarily formed from > the range > function. > I understand this (but is the example in the first message consistent with this?). This is certainly a reasonable interpetation. But if this is the way multiple index arrays are interpreted, how does one easily specify scattered points in a multidimensional array? The only other alternative I can think of is to use some of the dimensions of a multidimensional index array as indicies for each of the dimensions. For example, if one wanted to index random points in a 2d array, then supplying an nx2 array would provide a list of n such points. But I see this as a more limiting way to do this (and there are often benefits to being able to keep the indices for different dimensions in separate arrays. But I think doing what you would like to do is straightforward even with the existing implementation. For example, if x is a 2d array we could easily develop a function such that: x[outer_index_product([1,3,4],[1,5])] # with a better function name! The function outer_index_product would return a tuple of two index arrays each with a shape of 3x2. These arrays would not take up more space than the original arrays even though they appear to have a much larger size (the one dimension is replicated by use of a 0 stride size so the data buffer is the same as the original). Would this be acceptable? In the end, all these indexing behaviors can be provided by different functions. So it isn't really a question of which one to have and which not to have. The question is what is supported by the indexing notation? For us, the behavior we have implemented is far more useful for our applications than the one you propose. But perhaps we are in the minority, so I'd be very interested in hearing which indexing interpretation is most useful to the general community. > > Why not: > > > > ravel(a)[[9,10,11]] ? > > sure, that would work, especially if ravel doesn't make a copy of > the data > (which I presume it does not). > Correct. Perry |
Re: [Numpy-discussion] Re-implementation of Python Numerical arrays (Numeric) available for download
From: Perry G. <gre...@ho...> - 2001-11-18 01:22:22
|
From: Pete Shinners <pe...@sh...> > 7) necessary to add other types? > yes. i really want unsigned int16 and unsigned int32. all my operations > are on pixel data, and things can just get messy when i need to treat > packed color values as signed integers. > Unsigned int16 is already supported. UInt32 could be done, but raises some interesting issues with regard to combining with Int32. I don't believe the current implementation prevents you from carrying around unsigned data in Int32 arrays. If you are using them as packed color values, do you ever do any arithmetic operations on them other than to pack and unpack them? > one other thing i'd like there to be a little focus on is adding my own > new ufunc operators. for image manipulation i'd like new ufunc operators > that clamp the results to legal values. i'd be happy to do this myself, > but i don't believe it's possible with the current Numeric. > It will be possible for users to add their own ufuncs. We will eventually document how to do so (and it should be fairly simple to do once we give a few example templates). Perry > |
From: Joe H. <jh...@oo...> - 2001-11-19 16:46:15
|
Just to fill in the blanks, here's what IDL does: IDL> a = [[1,2,3,4], $ IDL> [5,6,7,8], $ IDL> [9,10,11,12], $ IDL> [13,14,15,16]] IDL> print,a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 IDL> print, a[[1,3],[0,3]] 2 16 --jh-- |
[Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download
From: Perry G. <pe...@st...> - 2001-11-20 20:28:50
|
> 6) Should array properties be accessible as public attributes > instead of through accessor methods? > > We don't currently allow public array attributes to make > the Python code simpler and faster (otherwise we will > be forced to use __setattr__ and such). This results in > incompatibilty with previous code that uses such attributes. > > > I prefer the use of public attributes over accessor methods. > > > -- > Paul Barrett, PhD Space Telescope Science Institute The issue of efficiency may not be a problem with Python 2.2 or later since it provides new mechanisms that avoid the need to use __setattr__ to solve this problem. (e.g. __slots__, property, __get__, and __set__). So it becomes more of an issue of which style people prefer rather than simplicity and speed of the code. Perry |
[Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download
From: Perry G. <pe...@st...> - 2001-11-26 20:58:42
|
> From: Chris Barker <chr...@ho...> > To: Perry Greenfield <pe...@st...>, > num...@li... > Subject: [Numpy-discussion] Re: Re-implementation of Python > Numerical arrays (Numeric) available > for download > > I used Poor wording. When I wrote "datatypes", I meant data types in a > much higher order sense. Perhaps structures or classes would be a better > term. What I mean is that is should be easy to use an manipulate the > same multidimensional arrays from both Python and C/C++. In the current > Numeric, most folks generate a contiguous array, and then just use the > array->data pointer to get what is essentially a C array. That's fine if > you are using it in a traditional C way, with fixed dimension, one > datatype, etc. What I'm imagining is having an object in C or C++ that > could be easily used as a multidimentional array. I'm thinking C++ would > probably neccesary, and probably templates as well, which is why blitz++ > looked promising. Of course, blitz++ only compiles with a few up-to-date > compilers, so you'd never get it into the standard library that way! > Yes, that was an important issue (C++ and the Python Standard Library). And yes, it is not terribly convenient to access multi-dimensional arrays in C (of varying sizes). We don't solve that problem in the way a C++ library could. But I suppose that some might say that C++ libraries may introduce their own, new problems. But coming up with the one solution to all scientific computing appears well beyond our grasp at the moment. If someone does see that solution, let us know! > I agree, but from the newsgroup, it is clear that a lot of folks are > very reluctant to use something that is not part of the standard > library. > We agree that getting into the standard library is important. > > > > We estimate > > > > that numarray is probably another order of magnitude worse, > > > > i.e., that 20K element arrays are at half the asymptotic > > > > speed. How much should this be improved? > > > > > > A lot. I use arrays smaller than that most of the time! > > > > > What is good enough? As fast as current Numeric? > > As fast as current Numeric would be "good enough" for me. It would be a > shame to go backwards in performance! > > > (IDL does much > > better than that for example). > > My personal benchmark is MATLAB, which I imagine is similar to IDL in > performance. > We'll see if we can match current performance (or at least present usable alternative approaches that are faster). > > 10 element arrays will never be > > close to C speed in any array based language embedded in an > > interpreted environment. > > Well, sure, I'm not expecting that > Good :-) > > 100, maybe, but will be very hard. > > 1000 should be possible with some work. > > I suppose MATLAB has it easier, as all arrays are doubles, and, (untill > recently anyway), all variable where arrays, and all arrays were 2-d. > NumPy is a lot more flexible that that. Is is the type and size checking > that takes the time? > Probably, but we haven't started serious benchmarking yet so I wouldn't put much stock in what I say now. > One of the things I do a lot with are coordinates of points and > polygons. Sets if points I can handle easily as an NX2 array, but > polygons don't work so well, as each polgon has a different number of > points, so I use a list of arrays, which I have to loop over. Each > polygon can have from about 10 to thousands of points (mostly 10-20, > however). One way I have dealt with this is to store a polygon set as a > large array of all the points, and another array with the indexes of the > start and end of each polygon. That way I can transform the coordinates > of all the polygons in one operation. It works OK, but sometimes it is > more useful to have them in a sequence. > This is a good example of an ensemble of variable sized arrays. > > As mentioned, > > we tend to deal with large data sets and so I don't think we have > > a lot of such examples ourselves. > > I know large datasets were one of your driving factors, but I really > don't want to make performance on smaller datasets secondary. > > -- > Christopher Barker, That's why we are asking, and it seems so far that there are enough of those that do care about small arrays to spend the effort to significantly improve the performance. Perry |