Re: [Numpy-discussion] Defining custom types

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 10/27/06, Travis Oliphant <oli...@ie...> wrote:
>
> Jonathan Wang wrote:
> > On 10/27/06, *Travis Oliphant* <oli...@ie...
> > <mailto:oli...@ie...>> wrote:
> >
> >     > If I redefine the string function, I encounter another, perhaps
> more
> >     > serious problem leading to a segfault. I've defined my string
> >     function
> >     > to be extremely simple:
> >     > >>> def printer(arr):
> >     > ...   return str(arr[0])
> >     >
> >     > Now, if I try to print an element of the array:
> >     > >>> mxArr[0]
> >     >
> >     > I get to this stack trace:
> >     > #0  scalar_value (scalar=0x814be10, descr=0x5079e0) at
> >     > scalartypes.inc.src:68
> >     > #1  0x0079936a in PyArray_Scalar (data=0x814cf98, descr=0x5079e0,
> >     > base=0x814e7a8) at arrayobject.c:1419
> >     > #2  0x007d259f in array_subscript_nice (self=0x814e7a8,
> >     op=0x804eb8c)
> >     > at arrayobject.c:1985
> >     > #3  0x00d17dde in PyObject_GetItem (o=0x814e7a8, key=0x804eb8c) at
> >     > Objects/abstract.c:94
> >     >
> >     > (Note: for some reason gdb claims that arrayobject.c:1985 is
> >     > array_subscript_nice, but looking at my source this line is
> >     actually
> >     > in array_item_nice. *boggle*)
> >     >
> >     > But scalar_value returns NULL for all non-native types. So,
> >     destptr in
> >     > PyArray_Scalar is set to NULL, and the call the copyswap
> segfaults.
> >     >
> >     > Perhaps scalar_value should be checking the scalarkind field of
> >     > PyArray_Descr, or using the elsize and alignment fields to
> >     figure out
> >     > the pointer to return if scalarkind isn't set?
> >
> >     Hmmm... It looks like the modifications to scalar_value did not take
> >     into account user-defined types.  I've added a correction so that
> >     user-defined types will use setitem to set the scalar value into the
> >     array.  Presumably your setitem function can handle setting the
> array
> >     with scalars of your new type?
> >
> >     I've checked the changes into SVN.
> >
> >
> > Do there also need to be changes in scalartypes.inc.src to use getitem
> > if a user-defined type does not inherit from a Numpy scalar?
> This needs to be clarified.  I don't think it's possible to do it
> without inheriting from a numpy scalar at this point (the void numpy
> scalar can be inherited from and is pretty generic).  I know I was not
> considering that case when I wrote the code.
> > i.e. at scalartypes.inc.src:114 we should return some pointer
> > calculated from the PyArray_Descr's elsize and alignment field to get
> > the destination for the "custom scalar" type to be copied.
> I think this is a good idea.  I doubt it's enough to fix all places that
> don't inherit from numpy scalars, but it's a start.
>
> It seems like we need to figure out where the beginning of the data is
> for the type which is assumed to be defined on alignment boundaries
> after a PyObject_HEAD  (right)?  This could actually be used for
> everything and all the switch and if statements eliminated.
>
> I think the alignment field is the only thing needed, though.  I don't
> see how I would use the elsize field?

Hmm, yeah, I guess alignment would be sufficient. Worst case, you could
delegate to setitem, right?

It would be useful to support arbitrary types. Suppose, for example, that I
wanted to make an array of structs. In keeping with the date/time example, I
might want to store a long and a double, the long for days in the Gregorian
calendar and the double for seconds from midnight on that day.

> Furthermore it seems like the scalar conversions prefer the builtin
> > types, but it seems to me that the user-defined type should be
> preferred.
> I'm not sure what this means.
> >
> >
> > i.e. if I try to get an element from my mxDateTime array, I get a
> > float back:
> > >>> mxArr[0] = DateTime.now()
> > >>> mxArr[0][0]
> > 732610.60691268521
> Why can you index mxArr[0]?  What is mxArr[0]?  If it's a scalar, then
> why can you index it?  What is type(mxArr[0])?

Ah, I am mistaken here - I am correctly getting my mxNumpyDateTime type
back:

mxArr is a 1x1 matrix:

>>> mxArr = numpy.empty((1,1), dtype = libMxNumpy.type)
>>> mxArr[0] = DateTime.now()
>>> type(mxArr)
<type 'numpy.ndarray'>
>>> type(mxArr[0])
<type 'numpy.ndarray'>
>>> type(mxArr[0][0])
<type 'mxNumpyDateTime'>
>>> mxArr.shape
(1, 1)

> But what I really want is the mxDateTime, which, oddly enough, is what
> > happens if I use tolist():
> > >>> mxArr.tolist()[0]
> > [<DateTime object for '2006-10-27 14:33:57.25' at a73c60>]
>
> That's not surprising because tolist just calls getitem on each element
> in the array to construct the list.

I guess this is a degenerate case, since I have getitem returning a
mxDateTime while the actual type of the elements in the array is
mxNumpyDateTime (i.e. mxNumpyType). Would the correct behavior, then, be for
getitem to return a mxNumpyDateTime and register the object cast function to
return a mxDateTime?

If I try to do math on the array, it seems like the operation is performed
via object pointers (mxDateTime - mxDateTime returns a DateTimeDelta object,
and mxNumpyDateTime is a float):
>>> mxArr = numpy.empty((1,1), dtype = libMxNumpy.type)
>>> mxArr[0][0] = DateTime.now()
>>> mxArr2 = numpy.empty((1,1), dtype = libMxNumpy.type)
>>> mxArr2[0][0] = DateTime.DateTimeFrom('2006-01-01')
>>> type(mxArr[0][0])
<type 'mxNumpyDateTime'>
>>> type(mxArr2[0][0])
<type 'mxNumpyDateTime'>
>>> sub = mxArr - mxArr2
>>> type(sub[0][0])
<type 'DateTimeDelta'>

I'm guessing I need to register ufunc loops for all the basic math on my
types?

Re: [Numpy-discussion] Defining custom types

A package for scientific computing with Python

Re: [Numpy-discussion] Defining custom types