From: Jonathan W. <jon...@gm...> - 2006-10-26 22:26:52
|
I'm trying to write a Numpy extension that will encapsulate mxDateTime as a native Numpy type. I've decided to use a type inherited from Numpy's scalar double. However, I'm running into all sorts of problems. I'm using numpy 1.0b5; I realize this is somewhat out of date. For all the examples below, assume that I've created a 1x1 array, mxArr, with my custom type. The interface used by Array_FromPyScalar does not conform with the documentation's claim that a negative return value indicates an error. The return code from setitem is not checked. Instead, the code depends on a Python error being set. I seem to be able to load values into the array, but I can't extract anything out of the array, even to print it. In gdb I've verified that loading DateTime.now() correctly puts a float representation of the date into my array. However, if I try to get the value out, I get an error: >>> mxArr[0] = DateTime.now() >>> mxArr[0] Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/site-packages/numpy/core/numeric.py", line 391, in array_repr ', ', "array(") File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", line 204, in array2string separator, prefix) File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", line 160, in _array2string format = _floatFormat(data, precision, suppress_small) File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", line 281, in _floatFormat non_zero = _uf.absolute(data.compress(_uf.not_equal(data, 0))) TypeError: bad operand type for abs() I'm not sure why it's trying to call abs() on my object to print it. I have a separate PyNumberMethods attached to my object type, copied from the float scalar type, and nb_absolute is set to 0. When I break at the various functions I've registered, the last thing Numpy tries to do is cast my custom data type to an object type (which it does so successfully) via _broadcast_cast. Thanks, Jonathan |
From: Travis O. <oli...@ee...> - 2006-10-26 23:19:25
|
Jonathan Wang wrote: > I'm trying to write a Numpy extension that will encapsulate mxDateTime > as a native Numpy type. I've decided to use a type inherited from > Numpy's scalar double. However, I'm running into all sorts of > problems. I'm using numpy 1.0b5; I realize this is somewhat out of date. > Cool. The ability to create your own data-types (and define ufuncs for them) is a feature that I'd like to see explored. But, it has not received a lot of attention and so you may find bugs along the way. We'll try to fix them quickly as they arise (and there will be bug fix releases for 1.0). But, what do you mean "inheriting" from NumPy's double for your scalar data-type. This has significant implications. To define a new data-type object (that doesn't build from the VOID data-type), you need to flesh out the PyArray_Descr * structure and this can only be done in C. Perhaps you are borrowing most entries in the structure builtin double type and then filling in a few differently like setitem and getitem? Is that accurate? > For all the examples below, assume that I've created a 1x1 array, > mxArr, with my custom type. > > The interface used by Array_FromPyScalar does not conform with the > documentation's claim that a negative return value indicates an error. You must be talking about a different function. Array_FromPyScalar is an internal function and not a C-API call. It also returns a PyObject * not an integer. So, which function are you actually referring to? > The return code from setitem is not checked. Instead, the code depends > on a Python error being set. This may be true, but how is it a problem? > > I seem to be able to load values into the array, but I can't extract > anything out of the array, even to print it. In gdb I've verified that > loading DateTime.now() correctly puts a float representation of the > date into my array. However, if I try to get the value out, I get an > error: > >>> mxArr[0] = DateTime.now() > >>> mxArr[0] > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/lib/python2.4/site-packages/numpy/core/numeric.py", line > 391, in array_repr > ', ', "array(") > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > line 204, in array2string > separator, prefix) > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > line 160, in _array2string > format = _floatFormat(data, precision, suppress_small) > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > line 281, in _floatFormat > non_zero = _uf.absolute(data.compress(_uf.not_equal(data, 0))) > TypeError: bad operand type for abs() > > I'm not sure why it's trying to call abs() on my object to print it. Because that's the implication of inheriting from a double. It's just part of the code that tries to format your values into an array (notice the _floatFormat). I actually borrowed this code from numarray so I can't speak to exactly what it's doing without more study. > I have a separate PyNumberMethods attached to my object type, copied > from the float scalar type, and nb_absolute is set to 0. When I break > at the various functions I've registered, the last thing Numpy tries > to do is cast my custom data type to an object type (which it does so > successfully) via _broadcast_cast. Don't confuse the Python object associated when an element of the array is extracted and the data-type of the array. Also don't confuse the PyNumberMethods of the scalar object with the ufuncs. Defining PyNumberMethods won't usually give you the ability to calculate ufuncs. Perhaps you just want to construct an "object" array of mxDateTime's. What is the reason you want to define an mxDateTime data-type? -Travis |
From: Jonathan W. <jon...@gm...> - 2006-10-26 23:37:47
|
On 10/26/06, Travis Oliphant <oli...@ee...> wrote: > > But, what do you mean "inheriting" from NumPy's double for your scalar > data-type. This has significant implications. To define a new > data-type object (that doesn't build from the VOID data-type), you need > to flesh out the PyArray_Descr * structure and this can only be done in > C. Perhaps you are borrowing most entries in the structure builtin > double type and then filling in a few differently like setitem and > getitem? Is that accurate? Sorry, I should have been clearer. When I talk about inheritance, I mean of the type underlying the array. For example, the built-in scalar double array has an underlying type of PyDoubleArrType_Type. My underlying type is a separate PyTypeObject. The interesting changes here are to tp_repr, tp_str, and tp_as_number. The rest of the fields are inherited from PyDoubleArrType_Type using the tp_base field. The array itself has another statically defined type object of type PyArray_Descr, which I'm creating with a PyObject_New call and filling in with many of the entries from the descriptor returned by PyArray_DescrFromType(NPY_DOUBLE), while overriding getitem and setitem to handle PyObject* of type mxDateTime as you guessed. > The interface used by Array_FromPyScalar does not conform with the > > documentation's claim that a negative return value indicates an error. > > You must be talking about a different function. Array_FromPyScalar is > an internal function and not a C-API call. It also returns a PyObject * > not an integer. So, which function are you actually referring to? > > > The return code from setitem is not checked. Instead, the code depends > > on a Python error being set. This may be true, but how is it a problem? > It's just confusing as the documentation indicates that the setitem function should return 0 for success and a negative number for failure. But within Array_FromPyScalar, we have: ret->descr->f->setitem(op, ret->data, ret); if (PyErr_Occurred()) { Py_DECREF(ret); return NULL; } else { return (PyObject *)ret; } So, someone reading the documentation could return -1 on failure without setting the Python error flag, and the function would happily continue on its way and fail to perform the proper casts. > > > I seem to be able to load values into the array, but I can't extract > > anything out of the array, even to print it. In gdb I've verified that > > loading DateTime.now() correctly puts a float representation of the > > date into my array. However, if I try to get the value out, I get an > > error: > > >>> mxArr[0] = DateTime.now() > > >>> mxArr[0] > > Traceback (most recent call last): > > File "<stdin>", line 1, in ? > > File "/usr/lib/python2.4/site-packages/numpy/core/numeric.py", line > > 391, in array_repr > > ', ', "array(") > > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > > line 204, in array2string > > separator, prefix) > > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > > line 160, in _array2string > > format = _floatFormat(data, precision, suppress_small) > > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > > line 281, in _floatFormat > > non_zero = _uf.absolute(data.compress(_uf.not_equal(data, 0))) > > TypeError: bad operand type for abs() > > > > I'm not sure why it's trying to call abs() on my object to print it. > > Because that's the implication of inheriting from a double. It's just > part of the code that tries to format your values into an array (notice > the _floatFormat). I actually borrowed this code from numarray so I > can't speak to exactly what it's doing without more study. Hmm, so does Numpy ignore the tp_repr and tp_str fields in the PyTypeObject of the underlying type? I admittedly haven't had a chance to look at this code closely yet. > I have a separate PyNumberMethods attached to my object type, copied > > from the float scalar type, and nb_absolute is set to 0. When I break > > at the various functions I've registered, the last thing Numpy tries > > to do is cast my custom data type to an object type (which it does so > > successfully) via _broadcast_cast. > > Don't confuse the Python object associated when an element of the array > is extracted and the data-type of the array. Also don't confuse the > PyNumberMethods of the scalar object with the ufuncs. Defining > PyNumberMethods won't usually give you the ability to calculate ufuncs. Okay, is my understanding here correct? I am defining two type descriptors: PyArray_Descr mxNumpyType - describes the Numpy array type. PyTypeObject mxNumpyDataType - describes the data type of the contents of the array (i.e. mxNumpyType->typeobj points to this), inherits from PyDoubleArrType_Type and overrides some fields as mentioned above. And the getitem and setitem functions are designed to only give/take PyObject* of type mxDateTime. I guess it's not clear to me whether the abs() referred to by the error is an abs() ufunc or the nb_absolute pointer in the PyNumberMethods. Let me try overriding ufuncs and get back to you... Perhaps you just want to construct an "object" array of mxDateTime's. > What is the reason you want to define an mxDateTime data-type? Currently I am using an object array of mxDateTime's, but it's rather frustrating that I can't treat them as normal floats internally since that's really all they are. Jonathan |
From: Robert K. <rob...@gm...> - 2006-10-26 23:49:58
|
Jonathan Wang wrote: > It's just confusing as the documentation indicates that the setitem > function should return 0 for success and a negative number for failure. > But within Array_FromPyScalar, we have: > > ret->descr->f->setitem(op, ret->data, ret); > > if (PyErr_Occurred()) { > Py_DECREF(ret); > return NULL; > } else { > return (PyObject *)ret; > } > > So, someone reading the documentation could return -1 on failure without > setting the Python error flag, and the function would happily continue > on its way and fail to perform the proper casts. That's a documentation vagueness, then. This is a convention established by the Python C API. If an error happens in a function that returns PyObject*, then it should return NULL to inform the caller that an error happened; other functions should return 0 for success and -1 for an error. However, the function must still set an exception object. The rest is just a convenient convention. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Travis O. <oli...@ee...> - 2006-10-27 00:15:09
|
> It's just confusing as the documentation indicates that the setitem > function should return 0 for success and a negative number for > failure. But within Array_FromPyScalar, we have: > > ret->descr->f->setitem(op, ret->data, ret); > > if (PyErr_Occurred()) { > Py_DECREF(ret); > return NULL; > } else { > return (PyObject *)ret; > } > I see the problem. We are assuming an error is set on failure, so both -1 should be returned and an error condition set for your own setitem function. This is typical Python behavior. I'll fix the documentation. > > > > > I seem to be able to load values into the array, but I can't extract > > anything out of the array, even to print it. In gdb I've > verified that > > loading DateTime.now() correctly puts a float representation of the > > date into my array. However, if I try to get the value out, I get an > > error: > > >>> mxArr[0] = DateTime.now() > > >>> mxArr[0] > > Traceback (most recent call last): > > File "<stdin>", line 1, in ? > > File "/usr/lib/python2.4/site-packages/numpy/core/numeric.py", > line > > 391, in array_repr > > ', ', "array(") > > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > > line 204, in array2string > > separator, prefix) > > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > > line 160, in _array2string > > format = _floatFormat(data, precision, suppress_small) > > File "/usr/lib/python2.4/site-packages/numpy/core/arrayprint.py", > > line 281, in _floatFormat > > non_zero = _uf.absolute(data.compress(_uf.not_equal(data, 0))) > > TypeError: bad operand type for abs() > > > > I'm not sure why it's trying to call abs() on my object to print it. > > Because that's the implication of inheriting from a double. It's > just > part of the code that tries to format your values into an array > (notice > the _floatFormat). I actually borrowed this code from numarray so I > can't speak to exactly what it's doing without more study. > > > Hmm, so does Numpy ignore the tp_repr and tp_str fields in the > PyTypeObject of the underlying type? admittedly haven't had a chance > to look at this code closely yet. How arrays print is actually user-settable. The default printing function does indeed ignore tp_repr and tp_str of the underlying scalar objects in order to be able to set precisions. Now, we could probably fix the default printing function to actually use the tp_repr and/or tp_str fields of the corresponding scalar objects. This is worth filing a ticket about. In the mean time you can create a new array print function that checks for your data-type as the type of the array and then does something different otherwise it calls the old function. Then, register this function as the print function for arrays. > > > > I have a separate PyNumberMethods attached to my object type, copied > > from the float scalar type, and nb_absolute is set to 0. When I > break > > at the various functions I've registered, the last thing Numpy > tries > > to do is cast my custom data type to an object type (which it > does so > > successfully) via _broadcast_cast. > > Don't confuse the Python object associated when an element of the > array > is extracted and the data-type of the array. Also don't confuse the > PyNumberMethods of the scalar object with the ufuncs. Defining > PyNumberMethods won't usually give you the ability to calculate > ufuncs. > > > Okay, is my understanding here correct? I am defining two type > descriptors: > PyArray_Descr mxNumpyType - describes the Numpy array type. > PyTypeObject mxNumpyDataType - describes the data type of the contents > of the array (i.e. mxNumpyType->typeobj points to this), inherits from > PyDoubleArrType_Type and overrides some fields as mentioned above. > The nomenclature is that mxNumPyType is the data-type of the array and your PyTypeObject is the "type" of the elements of the array. So, you have the names a bit backward. So, to correspond with the way I use the words "type" and "data-type", I would name them: PyArray_Descr mxNumpyDataType PyTypeObject mxNumpyType > And the getitem and setitem functions are designed to only give/take > PyObject* of type mxDateTime. > These are in the 'f' member of the PyArray_Descr structure, so presumably you have also filled in your PyArray_Descr structure with items from PyArray_DOUBLE? > I guess it's not clear to me whether the abs() referred to by the > error is an abs() ufunc or the nb_absolute pointer in the > PyNumberMethods. Let me try overriding ufuncs and get back to you... > > Perhaps you just want to construct an "object" array of mxDateTime's. > What is the reason you want to define an mxDateTime data-type? > > > Currently I am using an object array of mxDateTime's, but it's rather > frustrating that I can't treat them as normal floats internally since > that's really all they are. Ah, I see. So, you would like to be able to say view the array of mxDateTimes as an array of "floats" (using the .view method). You are correct that this doesn't make sense when you are talking about objects, but might if mxDateTime objects are really just floats. I just wanted to make sure you were aware of the object array route. The new data-type route is less well traveled but I'm anxious to smooth the wrinkles out. Your experiences will help. Bascially we are moving from Numeric being a "builtin data-types" only to a NumPy that has "arbitrary" data-types with a few special-cased "builtins" We need more experience to clarify the issues. Your identification of problems in the default printing, for example, is one thing that will help. Keep us posted. I'd love to here how things went and what can be done to improve. -Travis |
From: Jonathan W. <jon...@gm...> - 2006-10-27 14:35:16
|
On 10/26/06, Travis Oliphant <oli...@ee...> wrote: > > > Okay, is my understanding here correct? I am defining two type > > descriptors: > > PyArray_Descr mxNumpyType - describes the Numpy array type. > > PyTypeObject mxNumpyDataType - describes the data type of the contents > > of the array (i.e. mxNumpyType->typeobj points to this), inherits from > > PyDoubleArrType_Type and overrides some fields as mentioned above. > > > The nomenclature is that mxNumPyType is the data-type of the array and > your PyTypeObject is the "type" of the elements of the array. So, you > have the names a bit backward. > > So, to correspond with the way I use the words "type" and "data-type", I > would name them: > > PyArray_Descr mxNumpyDataType > PyTypeObject mxNumpyType Okay, I will use this convention going forwards. > And the getitem and setitem functions are designed to only give/take > > PyObject* of type mxDateTime. > > > These are in the 'f' member of the PyArray_Descr structure, so > presumably you have also filled in your PyArray_Descr structure with > items from PyArray_DOUBLE? That's correct. I have all members of the 'f' member identical to that from PyArray_DOUBLE, except: mxNumpyType->f->dotfunc = NULL; mxNumpyType->f->getitem = date_getitem; mxNumpyType->f->setitem = date_setitem; mxNumpyType->f->cast[PyArray_DOUBLE] = (PyArray_VectorUnaryFunc*) dateToDouble; mxNumpyType->f->cast[PyArray_OBJECT] = (PyArray_VectorUnaryFunc*) dateToObject; All other cast functions are NULL. If I redefine the string function, I encounter another, perhaps more serious problem leading to a segfault. I've defined my string function to be extremely simple: >>> def printer(arr): ... return str(arr[0]) Now, if I try to print an element of the array: >>> mxArr[0] I get to this stack trace: #0 scalar_value (scalar=0x814be10, descr=0x5079e0) at scalartypes.inc.src :68 #1 0x0079936a in PyArray_Scalar (data=0x814cf98, descr=0x5079e0, base=0x814e7a8) at arrayobject.c:1419 #2 0x007d259f in array_subscript_nice (self=0x814e7a8, op=0x804eb8c) at arrayobject.c:1985 #3 0x00d17dde in PyObject_GetItem (o=0x814e7a8, key=0x804eb8c) at Objects/abstract.c:94 (Note: for some reason gdb claims that arrayobject.c:1985 is array_subscript_nice, but looking at my source this line is actually in array_item_nice. *boggle*) But scalar_value returns NULL for all non-native types. So, destptr in PyArray_Scalar is set to NULL, and the call the copyswap segfaults. Perhaps scalar_value should be checking the scalarkind field of PyArray_Descr, or using the elsize and alignment fields to figure out the pointer to return if scalarkind isn't set? |
From: Travis O. <oli...@ie...> - 2006-10-27 16:03:39
|
Jonathan Wang wrote: > On 10/26/06, *Travis Oliphant* <oli...@ee... > <mailto:oli...@ee...>> wrote: > > > Okay, is my understanding here correct? I am defining two type > > descriptors: > > PyArray_Descr mxNumpyType - describes the Numpy array type. > > PyTypeObject mxNumpyDataType - describes the data type of the > contents > > of the array (i.e. mxNumpyType->typeobj points to this), > inherits from > > PyDoubleArrType_Type and overrides some fields as mentioned above. > > > The nomenclature is that mxNumPyType is the data-type of the array and > your PyTypeObject is the "type" of the elements of the array. > So, you > have the names a bit backward. > > So, to correspond with the way I use the words "type" and > "data-type", I > would name them: > > PyArray_Descr mxNumpyDataType > PyTypeObject mxNumpyType > > > Okay, I will use this convention going forwards. > > > And the getitem and setitem functions are designed to only > give/take > > PyObject* of type mxDateTime. > > > These are in the 'f' member of the PyArray_Descr structure, so > presumably you have also filled in your PyArray_Descr structure with > items from PyArray_DOUBLE? > > > That's correct. I have all members of the 'f' member identical to that > from PyArray_DOUBLE, except: > > mxNumpyType->f->dotfunc = NULL; > mxNumpyType->f->getitem = date_getitem; > mxNumpyType->f->setitem = date_setitem; > mxNumpyType->f->cast[PyArray_DOUBLE] = (PyArray_VectorUnaryFunc*) > dateToDouble; > mxNumpyType->f->cast[PyArray_OBJECT] = (PyArray_VectorUnaryFunc*) > dateToObject; > > All other cast functions are NULL. > > If I redefine the string function, I encounter another, perhaps more > serious problem leading to a segfault. I've defined my string function > to be extremely simple: > >>> def printer(arr): > ... return str(arr[0]) > > Now, if I try to print an element of the array: > >>> mxArr[0] > > I get to this stack trace: > #0 scalar_value (scalar=0x814be10, descr=0x5079e0) at > scalartypes.inc.src:68 > #1 0x0079936a in PyArray_Scalar (data=0x814cf98, descr=0x5079e0, > base=0x814e7a8) at arrayobject.c:1419 > #2 0x007d259f in array_subscript_nice (self=0x814e7a8, op=0x804eb8c) > at arrayobject.c:1985 > #3 0x00d17dde in PyObject_GetItem (o=0x814e7a8, key=0x804eb8c) at > Objects/abstract.c:94 > > (Note: for some reason gdb claims that arrayobject.c:1985 is > array_subscript_nice, but looking at my source this line is actually > in array_item_nice. *boggle*) > > But scalar_value returns NULL for all non-native types. So, destptr in > PyArray_Scalar is set to NULL, and the call the copyswap segfaults. > > Perhaps scalar_value should be checking the scalarkind field of > PyArray_Descr, or using the elsize and alignment fields to figure out > the pointer to return if scalarkind isn't set? Hmmm... It looks like the modifications to scalar_value did not take into account user-defined types. I've added a correction so that user-defined types will use setitem to set the scalar value into the array. Presumably your setitem function can handle setting the array with scalars of your new type? I've checked the changes into SVN. -Travis |
From: Jonathan W. <jon...@gm...> - 2006-10-27 19:37:13
|
On 10/27/06, Travis Oliphant <oli...@ie...> wrote: > > > If I redefine the string function, I encounter another, perhaps more > > serious problem leading to a segfault. I've defined my string function > > to be extremely simple: > > >>> def printer(arr): > > ... return str(arr[0]) > > > > Now, if I try to print an element of the array: > > >>> mxArr[0] > > > > I get to this stack trace: > > #0 scalar_value (scalar=0x814be10, descr=0x5079e0) at > > scalartypes.inc.src:68 > > #1 0x0079936a in PyArray_Scalar (data=0x814cf98, descr=0x5079e0, > > base=0x814e7a8) at arrayobject.c:1419 > > #2 0x007d259f in array_subscript_nice (self=0x814e7a8, op=0x804eb8c) > > at arrayobject.c:1985 > > #3 0x00d17dde in PyObject_GetItem (o=0x814e7a8, key=0x804eb8c) at > > Objects/abstract.c:94 > > > > (Note: for some reason gdb claims that arrayobject.c:1985 is > > array_subscript_nice, but looking at my source this line is actually > > in array_item_nice. *boggle*) > > > > But scalar_value returns NULL for all non-native types. So, destptr in > > PyArray_Scalar is set to NULL, and the call the copyswap segfaults. > > > > Perhaps scalar_value should be checking the scalarkind field of > > PyArray_Descr, or using the elsize and alignment fields to figure out > > the pointer to return if scalarkind isn't set? > > Hmmm... It looks like the modifications to scalar_value did not take > into account user-defined types. I've added a correction so that > user-defined types will use setitem to set the scalar value into the > array. Presumably your setitem function can handle setting the array > with scalars of your new type? > > I've checked the changes into SVN. Do there also need to be changes in scalartypes.inc.src to use getitem if a user-defined type does not inherit from a Numpy scalar? i.e. at scalartypes.inc.src:114 we should return some pointer calculated from the PyArray_Descr's elsize and alignment field to get the destination for the "custom scalar" type to be copied. As it stands, if the user-defined type does not inherit from a Numpy scalar, lots of things continue to break. Furthermore it seems like the scalar conversions prefer the builtin types, but it seems to me that the user-defined type should be preferred. i.e. if I try to get an element from my mxDateTime array, I get a float back: >>> mxArr[0] = DateTime.now() >>> mxArr[0][0] 732610.60691268521 But what I really want is the mxDateTime, which, oddly enough, is what happens if I use tolist(): >>> mxArr.tolist()[0] [<DateTime object for '2006-10-27 14:33:57.25' at a73c60>] |
From: Travis O. <oli...@ie...> - 2006-10-27 20:01:24
|
Jonathan Wang wrote: > On 10/27/06, *Travis Oliphant* <oli...@ie... > <mailto:oli...@ie...>> wrote: > > > If I redefine the string function, I encounter another, perhaps more > > serious problem leading to a segfault. I've defined my string > function > > to be extremely simple: > > >>> def printer(arr): > > ... return str(arr[0]) > > > > Now, if I try to print an element of the array: > > >>> mxArr[0] > > > > I get to this stack trace: > > #0 scalar_value (scalar=0x814be10, descr=0x5079e0) at > > scalartypes.inc.src:68 > > #1 0x0079936a in PyArray_Scalar (data=0x814cf98, descr=0x5079e0, > > base=0x814e7a8) at arrayobject.c:1419 > > #2 0x007d259f in array_subscript_nice (self=0x814e7a8, > op=0x804eb8c) > > at arrayobject.c:1985 > > #3 0x00d17dde in PyObject_GetItem (o=0x814e7a8, key=0x804eb8c) at > > Objects/abstract.c:94 > > > > (Note: for some reason gdb claims that arrayobject.c:1985 is > > array_subscript_nice, but looking at my source this line is > actually > > in array_item_nice. *boggle*) > > > > But scalar_value returns NULL for all non-native types. So, > destptr in > > PyArray_Scalar is set to NULL, and the call the copyswap segfaults. > > > > Perhaps scalar_value should be checking the scalarkind field of > > PyArray_Descr, or using the elsize and alignment fields to > figure out > > the pointer to return if scalarkind isn't set? > > Hmmm... It looks like the modifications to scalar_value did not take > into account user-defined types. I've added a correction so that > user-defined types will use setitem to set the scalar value into the > array. Presumably your setitem function can handle setting the array > with scalars of your new type? > > I've checked the changes into SVN. > > > Do there also need to be changes in scalartypes.inc.src to use getitem > if a user-defined type does not inherit from a Numpy scalar? This needs to be clarified. I don't think it's possible to do it without inheriting from a numpy scalar at this point (the void numpy scalar can be inherited from and is pretty generic). I know I was not considering that case when I wrote the code. > i.e. at scalartypes.inc.src:114 we should return some pointer > calculated from the PyArray_Descr's elsize and alignment field to get > the destination for the "custom scalar" type to be copied. I think this is a good idea. I doubt it's enough to fix all places that don't inherit from numpy scalars, but it's a start. It seems like we need to figure out where the beginning of the data is for the type which is assumed to be defined on alignment boundaries after a PyObject_HEAD (right)? This could actually be used for everything and all the switch and if statements eliminated. I think the alignment field is the only thing needed, though. I don't see how I would use the elsize field? > > As it stands, if the user-defined type does not inherit from a Numpy > scalar, lots of things continue to break. Not surprising, I did not make sure and support this. > Furthermore it seems like the scalar conversions prefer the builtin > types, but it seems to me that the user-defined type should be preferred. I'm not sure what this means. > > > i.e. if I try to get an element from my mxDateTime array, I get a > float back: > >>> mxArr[0] = DateTime.now() > >>> mxArr[0][0] > 732610.60691268521 Why can you index mxArr[0]? What is mxArr[0]? If it's a scalar, then why can you index it? What is type(mxArr[0])? > > But what I really want is the mxDateTime, which, oddly enough, is what > happens if I use tolist(): > >>> mxArr.tolist()[0] > [<DateTime object for '2006-10-27 14:33:57.25' at a73c60>] That's not surprising because tolist just calls getitem on each element in the array to construct the list. -Travis |
From: Jonathan W. <jon...@gm...> - 2006-10-27 21:08:11
|
On 10/27/06, Travis Oliphant <oli...@ie...> wrote: > > Jonathan Wang wrote: > > On 10/27/06, *Travis Oliphant* <oli...@ie... > > <mailto:oli...@ie...>> wrote: > > > > > If I redefine the string function, I encounter another, perhaps > more > > > serious problem leading to a segfault. I've defined my string > > function > > > to be extremely simple: > > > >>> def printer(arr): > > > ... return str(arr[0]) > > > > > > Now, if I try to print an element of the array: > > > >>> mxArr[0] > > > > > > I get to this stack trace: > > > #0 scalar_value (scalar=0x814be10, descr=0x5079e0) at > > > scalartypes.inc.src:68 > > > #1 0x0079936a in PyArray_Scalar (data=0x814cf98, descr=0x5079e0, > > > base=0x814e7a8) at arrayobject.c:1419 > > > #2 0x007d259f in array_subscript_nice (self=0x814e7a8, > > op=0x804eb8c) > > > at arrayobject.c:1985 > > > #3 0x00d17dde in PyObject_GetItem (o=0x814e7a8, key=0x804eb8c) at > > > Objects/abstract.c:94 > > > > > > (Note: for some reason gdb claims that arrayobject.c:1985 is > > > array_subscript_nice, but looking at my source this line is > > actually > > > in array_item_nice. *boggle*) > > > > > > But scalar_value returns NULL for all non-native types. So, > > destptr in > > > PyArray_Scalar is set to NULL, and the call the copyswap > segfaults. > > > > > > Perhaps scalar_value should be checking the scalarkind field of > > > PyArray_Descr, or using the elsize and alignment fields to > > figure out > > > the pointer to return if scalarkind isn't set? > > > > Hmmm... It looks like the modifications to scalar_value did not take > > into account user-defined types. I've added a correction so that > > user-defined types will use setitem to set the scalar value into the > > array. Presumably your setitem function can handle setting the > array > > with scalars of your new type? > > > > I've checked the changes into SVN. > > > > > > Do there also need to be changes in scalartypes.inc.src to use getitem > > if a user-defined type does not inherit from a Numpy scalar? > This needs to be clarified. I don't think it's possible to do it > without inheriting from a numpy scalar at this point (the void numpy > scalar can be inherited from and is pretty generic). I know I was not > considering that case when I wrote the code. > > i.e. at scalartypes.inc.src:114 we should return some pointer > > calculated from the PyArray_Descr's elsize and alignment field to get > > the destination for the "custom scalar" type to be copied. > I think this is a good idea. I doubt it's enough to fix all places that > don't inherit from numpy scalars, but it's a start. > > It seems like we need to figure out where the beginning of the data is > for the type which is assumed to be defined on alignment boundaries > after a PyObject_HEAD (right)? This could actually be used for > everything and all the switch and if statements eliminated. > > I think the alignment field is the only thing needed, though. I don't > see how I would use the elsize field? Hmm, yeah, I guess alignment would be sufficient. Worst case, you could delegate to setitem, right? It would be useful to support arbitrary types. Suppose, for example, that I wanted to make an array of structs. In keeping with the date/time example, I might want to store a long and a double, the long for days in the Gregorian calendar and the double for seconds from midnight on that day. > Furthermore it seems like the scalar conversions prefer the builtin > > types, but it seems to me that the user-defined type should be > preferred. > I'm not sure what this means. > > > > > > i.e. if I try to get an element from my mxDateTime array, I get a > > float back: > > >>> mxArr[0] = DateTime.now() > > >>> mxArr[0][0] > > 732610.60691268521 > Why can you index mxArr[0]? What is mxArr[0]? If it's a scalar, then > why can you index it? What is type(mxArr[0])? Ah, I am mistaken here - I am correctly getting my mxNumpyDateTime type back: mxArr is a 1x1 matrix: >>> mxArr = numpy.empty((1,1), dtype = libMxNumpy.type) >>> mxArr[0] = DateTime.now() >>> type(mxArr) <type 'numpy.ndarray'> >>> type(mxArr[0]) <type 'numpy.ndarray'> >>> type(mxArr[0][0]) <type 'mxNumpyDateTime'> >>> mxArr.shape (1, 1) > But what I really want is the mxDateTime, which, oddly enough, is what > > happens if I use tolist(): > > >>> mxArr.tolist()[0] > > [<DateTime object for '2006-10-27 14:33:57.25' at a73c60>] > > That's not surprising because tolist just calls getitem on each element > in the array to construct the list. I guess this is a degenerate case, since I have getitem returning a mxDateTime while the actual type of the elements in the array is mxNumpyDateTime (i.e. mxNumpyType). Would the correct behavior, then, be for getitem to return a mxNumpyDateTime and register the object cast function to return a mxDateTime? If I try to do math on the array, it seems like the operation is performed via object pointers (mxDateTime - mxDateTime returns a DateTimeDelta object, and mxNumpyDateTime is a float): >>> mxArr = numpy.empty((1,1), dtype = libMxNumpy.type) >>> mxArr[0][0] = DateTime.now() >>> mxArr2 = numpy.empty((1,1), dtype = libMxNumpy.type) >>> mxArr2[0][0] = DateTime.DateTimeFrom('2006-01-01') >>> type(mxArr[0][0]) <type 'mxNumpyDateTime'> >>> type(mxArr2[0][0]) <type 'mxNumpyDateTime'> >>> sub = mxArr - mxArr2 >>> type(sub[0][0]) <type 'DateTimeDelta'> I'm guessing I need to register ufunc loops for all the basic math on my types? |
From: David D. <dav...@lo...> - 2006-11-16 16:44:51
|
On Thu, Oct 26, 2006 at 05:26:47PM -0500, Jonathan Wang wrote: > I'm trying to write a Numpy extension that will encapsulate mxDateTime as= a > native Numpy type. I've decided to use a type inherited from Numpy's scal= ar > double. However, I'm running into all sorts of problems. I'm using numpy > 1.0b5; I realize this is somewhat out of date. >=20 Hi, just to ask you: how is the work going on encapsulatinsg mx.DateTime as a native numpy type?=20 And most important: is the code available somewhere? I am also interested in using DateTime objects in numpy arrays. For now, I've always used arrays of floats (using gmticks values of dates). Thanks you, David --=20 David Douard LOGILAB, Paris (France) Formations Python, Zope, Plone, Debian : http://www.logilab.fr/formations D=E9veloppement logiciel sur mesure : http://www.logilab.fr/services Informatique scientifique : http://www.logilab.fr/science |
From: Pierre GM <pgm...@gm...> - 2006-11-16 17:01:13
|
On Thursday 16 November 2006 11:44, David Douard wrote: > Hi, just to ask you: how is the work going on encapsulatinsg mx.DateTime > as a native numpy type? > And most important: is the code available somewhere? I am also > interested in using DateTime objects in numpy arrays. For now, I've > always used arrays of floats (using gmticks values of dates). And I, as arrays of objects (well, I wrote a subclass to deal with dates, where each element is a datetime object, with methods to translate to floats or strings , but it's far from optimal...). I'd also be quite interested in checking what has been done. |
From: Colin J. W. <cj...@sy...> - 2006-11-16 17:40:29
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> <br> <br> David Douard wrote: <blockquote cite="mid...@cr..." type="cite"> <pre wrap="">On Thu, Oct 26, 2006 at 05:26:47PM -0500, Jonathan Wang wrote: </pre> <blockquote type="cite"> <pre wrap="">I'm trying to write a Numpy extension that will encapsulate mxDateTime as a native Numpy type. I've decided to use a type inherited from Numpy's scalar double. However, I'm running into all sorts of problems. I'm using numpy 1.0b5; I realize this is somewhat out of date. </pre> </blockquote> <pre wrap=""><!----> Hi, just to ask you: how is the work going on encapsulatinsg mx.DateTime as a native numpy type? And most important: is the code available somewhere? I am also interested in using DateTime objects in numpy arrays. For now, I've always used arrays of floats (using gmticks values of dates). Thanks you, David </pre> </blockquote> It would be nice if dtype were subclassable to handle this sort of thing.<br> <br> Colin W. </body> </html> |