From: Jin-chung H. <hs...@st...> - 2004-04-02 19:12:26
|
> I've been trying to generate a 2-dimensional numarray.records array > and am rather puzzled about some failures: > > I tried a pair of 2x2 arrays for the buffer (one per field). I > thought the record would get its shape from those, but instead of a > 2x2 record I get a 2-length record, each of whose elements is a pair > of 2-length arrays. > > >>> arr1 = num.arange(4, shape=(2,2), type=num.Float64) > >>> arr2 = num.arange(4, shape=(2,2), type=num.Float64) + 10 > >>> a = rec.array([arr1, arr2], names="a,b") > >>> a > array( > [(array([ 0., 1.]), array([ 10., 11.])), > (array([ 2., 3.]), array([ 12., 13.]))], > formats=['(2,)Float64', '(2,)Float64'], > shape=2, > names=['a', 'b']) > > So I tried passing in 4-length arrays while specifying the shape > explicitly, but it failed. Is this a bug?: > > >>> arr1 = num.arange(4, type=num.Float64) > >>> arr2 = num.arange(4, type=num.Float64) + 10 > >>> a = rec.array([arr1, arr2], shape=(2,2), names="a, b") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/local/lib/python2.3/site-packages/numarray/records.py", > line 384, in array > byteorder=byteorder, aligned=aligned) > File "/usr/local/lib/python2.3/site-packages/numarray/records.py", > line 177, in fromarrays > raise ValueError, "array has different lengths" > ValueError: array has different lengths > > Generating a 4-length record and reshaping it does seem to work, > though there seems to be a bug in __str__ which I'll report: > > >>> arr1 = num.arange(4, type=num.Float64) > >>> arr2 = num.arange(4, type=num.Float64) + 10 > >>> a = rec.array([arr1, arr2], names="a, b") > >>> a.setshape((2,2)) > >>> a > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/local/lib/python2.3/site-packages/numarray/records.py", > line 718, in __repr__ > outlist.append(Record.__str__(i)) > TypeError: unbound method __str__() must be called with Record > instance as first argument (got RecArray instance instead) > >>> a[1,1] > <numarray.records.Record instance at 0x15d2ad0> > >>> str(a[1,1]) > '(1.0, 11.0)' > > I see the str problem again if I don't specify any buffer: > > >>> import numarray as num > >>> import numarray.records as rec > >>> a = rec.array(formats="Float64,Float64", names="a,b", shape=(2,2)) > >>> a > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/local/lib/python2.3/site-packages/numarray/records.py", > line 718, in __repr__ > outlist.append(Record.__str__(i)) > TypeError: unbound method __str__() must be called with Record > instance as first argument (got RecArray instance instead) > > So...comments? Should I report the shape issues as a bug? > > -- Russell First of all, the records module was developed mainly having the 1-D table in mind. Even though it can have higher than one dimension, it is not thoroughly tested, as you have found out. However, I'd argue that in many cases that the need to use a 2-D (or high) table can be substituted by having an array in each cell(element). In your example, instead of creating a 2x2 table with each cell just having one number, you may be able to use a table with just one row and each cell is a 2x2 array. You can create such a record like this: --> arr1 = num.arange(4, shape=(1,2,2), type=num.Float64) --> arr2 = num.arange(4, shape=(1,2,2), type=num.Float64)+10 --> a = rec.array([arr1, arr2], names="a,b") I'd be interested in your application as to why a 2x2 table is necessary. The __str__ method in RecArray is rather primitive but usually works for 1-D tables. Eventually, we'll need a C-implementation of this method for both speed and flexibility. When you use a list for the buffer in the array function, it is using a relatively intuitive (simplistic?) approach. It tries to figure out the (one dimensional) length of each item in the list and use that as the record shape. JC Hsu |
From: Jin-chung H. <hs...@st...> - 2004-04-02 21:29:38
|
> >--> arr1 = num.arange(4, shape=(1,2,2), type=num.Float64) > >--> arr2 = num.arange(4, shape=(1,2,2), type=num.Float64)+10 > >--> a = rec.array([arr1, arr2], names="a,b") > > But is there any advantage to that compared to just using named > arrays of the desired shape: > a = num.arange(4, shape=(2,2), type=num.Float64) > b = num.arange(4, shape=(2,2), type=num.Float64)+10 > Not really, in this particular example. > >I'd be interested in your application as to why a 2x2 table is necessary. > > Here are two different uses I've come up with (both related to image > processing). [snip] You need it because you need to pass it to the C-structure, I think. In any case, you have found a way to get around the problem by using setshape. I'll take a look of the module to get an idea of how much effort is needed to make the 2-D (and higher) record arrays to work more smoothly. JC Hsu |
From: Russell E O. <ow...@as...> - 2004-04-02 20:08:18
|
At 2:12 PM -0500 2004-04-02, Jin-chung Hsu wrote: >First of all, the records module was developed mainly having the 1-D table in >mind. Even though it can have higher than one dimension, it is not thoroughly >tested, as you have found out. However, I'd argue that in many cases that the >need to use a 2-D (or high) table can be substituted by having an >array in each >cell(element). In your example, instead of creating a 2x2 table >with each cell >just having one number, you may be able to use a table with just one row and >each cell is a 2x2 array. You can create such a record like this: > >--> arr1 = num.arange(4, shape=(1,2,2), type=num.Float64) >--> arr2 = num.arange(4, shape=(1,2,2), type=num.Float64)+10 >--> a = rec.array([arr1, arr2], names="a,b") But is there any advantage to that compared to just using named arrays of the desired shape: a = num.arange(4, shape=(2,2), type=num.Float64) b = num.arange(4, shape=(2,2), type=num.Float64)+10 >I'd be interested in your application as to why a 2x2 table is necessary. Here are two different uses I've come up with (both related to image processing). Both are beautifully served by a 2-d records array: 1) Find the centroid of a star. The algorithm I'm using (invented by Jim Gunn, I believe) is to walk across the image, looking for the point of maximum symmetry. At each point total pixels and a measure of asymmetry are measured in a 3x3 grid centered at that point. The minimum asymmetry in that 3x3 array is then used to determine where to walk next. (At the end a parabolic fit is done to the 3x3 asymmetry data to find the true centroid; up until then it's only know to the nearest pixel). In any case...right now I maintain two separate 3x3 arrays (total pixels and asymmetry). Whenever I take a step I shift the both arrays and then compute new data for the points which are missing data. It would be cleaner and nicer to maintain one 3x3 records array with fields "totPix" and "asymm". Then the related data sticks together and I only have to shift the data once. (I meant to code it that way from the start, but my early attempts to use numeric.records were a disaster. I have a somewhat better handle on it now and may update my code). 2) Find all stars on an image. The algorithm I'm using (invented by Jeff Morgan, I believe) is to break an image up into blocks of, say, 5x5 pixels. I then compute information about each "super pixel", such as center of mass, total counts, etc. My C++ code has 12 items of information for each super pixel (including 7 boolean flags) and is written to use a 2-dimensional array each element of which is a data structure with the appropriate fields. The obvious python equivalent is a numarray.records array. It sure sounds better than trying to keep track of 12 separate arrays! -- Russell |