From: Matthieu P. <pe...@sh...> - 2006-06-16 18:01:53
|
hi, I need to handle strings shaped by a numpy array whose data own to a C structure. There is several possible answers to this problem : 1) use a numpy array of strings (PyArray_STRING) and so a (char *) object in C. It works as is, but you need to define a maximum size to your strin= gs because your set of strings is contiguous in memory. 2) use a numpy array of objects (PyArray_OBJECT), and wrap each =ABC stri= ng=BB with a python object, using PyStringObject for example. Then our problem = is that there is as wrapper as data element and I believe data can't be shar= ed when your created PyStringObject using (char *) thanks to PyString_AsStringAndSize by example. Now, I will expose a third way, which allow you to use no size-limited stri= ngs (as in solution 1.) and don't create wrappers before you really need it (on demand/access). =46irst, for convenience, we will use in C, (char **) type to build an arra= y of string pointers (as it was suggested in solution 2). Now, the game is to make it works with numpy API, and use it in python through a python array. Basically, I want a very similar behabiour than arrays of PyObject, where data are not contiguous, only their address are. So, the idea is to create a new array descr based on PyArray_OBJECT and change its getitem/setitem functions to deals with my own data. I exepected numpy to work with this convenient array descr, but it fails because PyArray_Scalar (arrayobject.c) don't call descriptor getitem functi= on (in PyArray_OBJECT case) but call 2 lines which have been copy/paste from the OBJECT_getitem function). Here my small patch is : replace (arrayobject.c:983-984): Py_INCREF(*((PyObject **)data)); return *((PyObject **)data); by : return descr->f->getitem(data, base); I play a lot with my new numpy array after this change and noticed that a l= ot of uses works : >>> a =3D myArray() array([["plop", "blups"]], dtype=3Dobject) >>> print a [["plop", "blups"]] >>> a[0, 0] =3D "youpiiii" >>> print a [["youpiiii", "blups"]] s =3D a[0, 0] >>> print s "youpiiii" >>> b =3D a[:] #data was shared with 'a' (similar behaviour than array of=20 objects) >>> >>> numpy.zeros(1, dtype =3D a.dtype)=20 Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: fields with object members not yet supported. >>> numpy.array(a) segmentation fault =46inally, I found a forgotten check in multiarraymodule.c (_array_fromobje= ct function), after label finish (line 4661), add : if (!ret) { Py_INCREF(Py_None); return Py_None; } After this change, I obtained (when I was not in interactive mode) : # numpy.array(a) Exception exceptions.TypeError: 'fields with object members not yet=20 supported.' in 'garbage collection' ignored =46atal Python error: unexpected exception during garbage collection Abandon But strangely, when I was in interactive mode, one time it fails and raise = an exception (good behaviour), and the next time it only returns None. >>> numpy.array(myArray()) TypeError: fields with object members not yet supported. >>> a=3Dnumpy.array(myArray()); print a None A bug remains (I will explore it later), but it is better than before. This mail, show how to map (char **) on a numpy array, but it's easy to use the same idea to handle any types (your_object **). I'll be pleased to discuss on any comments on the proposed solution or any others you can find. =2D- Matthieu Perrot Tel: +33 1 69 86 78 21 CEA - SHFJ Fax: +33 1 69 86 77 86 4, place du General Leclerc 91401 Orsay Cedex France |