From: Matthieu P. <pe...@sh...> - 2006-06-16 18:01:53
|
hi, I need to handle strings shaped by a numpy array whose data own to a C structure. There is several possible answers to this problem : 1) use a numpy array of strings (PyArray_STRING) and so a (char *) object in C. It works as is, but you need to define a maximum size to your strin= gs because your set of strings is contiguous in memory. 2) use a numpy array of objects (PyArray_OBJECT), and wrap each =ABC stri= ng=BB with a python object, using PyStringObject for example. Then our problem = is that there is as wrapper as data element and I believe data can't be shar= ed when your created PyStringObject using (char *) thanks to PyString_AsStringAndSize by example. Now, I will expose a third way, which allow you to use no size-limited stri= ngs (as in solution 1.) and don't create wrappers before you really need it (on demand/access). =46irst, for convenience, we will use in C, (char **) type to build an arra= y of string pointers (as it was suggested in solution 2). Now, the game is to make it works with numpy API, and use it in python through a python array. Basically, I want a very similar behabiour than arrays of PyObject, where data are not contiguous, only their address are. So, the idea is to create a new array descr based on PyArray_OBJECT and change its getitem/setitem functions to deals with my own data. I exepected numpy to work with this convenient array descr, but it fails because PyArray_Scalar (arrayobject.c) don't call descriptor getitem functi= on (in PyArray_OBJECT case) but call 2 lines which have been copy/paste from the OBJECT_getitem function). Here my small patch is : replace (arrayobject.c:983-984): Py_INCREF(*((PyObject **)data)); return *((PyObject **)data); by : return descr->f->getitem(data, base); I play a lot with my new numpy array after this change and noticed that a l= ot of uses works : >>> a =3D myArray() array([["plop", "blups"]], dtype=3Dobject) >>> print a [["plop", "blups"]] >>> a[0, 0] =3D "youpiiii" >>> print a [["youpiiii", "blups"]] s =3D a[0, 0] >>> print s "youpiiii" >>> b =3D a[:] #data was shared with 'a' (similar behaviour than array of=20 objects) >>> >>> numpy.zeros(1, dtype =3D a.dtype)=20 Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: fields with object members not yet supported. >>> numpy.array(a) segmentation fault =46inally, I found a forgotten check in multiarraymodule.c (_array_fromobje= ct function), after label finish (line 4661), add : if (!ret) { Py_INCREF(Py_None); return Py_None; } After this change, I obtained (when I was not in interactive mode) : # numpy.array(a) Exception exceptions.TypeError: 'fields with object members not yet=20 supported.' in 'garbage collection' ignored =46atal Python error: unexpected exception during garbage collection Abandon But strangely, when I was in interactive mode, one time it fails and raise = an exception (good behaviour), and the next time it only returns None. >>> numpy.array(myArray()) TypeError: fields with object members not yet supported. >>> a=3Dnumpy.array(myArray()); print a None A bug remains (I will explore it later), but it is better than before. This mail, show how to map (char **) on a numpy array, but it's easy to use the same idea to handle any types (your_object **). I'll be pleased to discuss on any comments on the proposed solution or any others you can find. =2D- Matthieu Perrot Tel: +33 1 69 86 78 21 CEA - SHFJ Fax: +33 1 69 86 77 86 4, place du General Leclerc 91401 Orsay Cedex France |
From: Benjamin T. <ben...@de...> - 2006-06-19 11:47:55
|
Le Vendredi 16 Juin 2006 20:01, Matthieu Perrot a =E9crit=A0: > hi, > > I need to handle strings shaped by a numpy array whose data own to a C (...) > a new array descr based on PyArray_OBJECT and change its getitem/setitem > -- > Matthieu Perrot Tel: +33 1 69 86 78 21 > CEA - SHFJ Fax: +33 1 69 86 77 86 > 4, place du General Leclerc > 91401 Orsay Cedex France Hi, Seems i had the similar problem when i tried to use numpy to map STL's C++= =20 vector (which are contiguous structures). I actually tried to overload the= =20 getitem() field of my own dtype to build python wrappers at runtime around= =20 the allocated C objects array (ie. NOT an array of Python Object). Actually your suggested modification seems to work for me, i dunno if it's = the=20 right solution, still. Is there any plans to update the trunk which something similar ? =2D- Benjamin Thyreau decideur.info |
From: Travis O. <oli...@ie...> - 2006-06-20 09:24:47
|
Matthieu Perrot wrote: > hi, > > I need to handle strings shaped by a numpy array whose data own to a C > structure. There is several possible answers to this problem : > 1) use a numpy array of strings (PyArray_STRING) and so a (char *) object > in C. It works as is, but you need to define a maximum size to your strings > because your set of strings is contiguous in memory. > 2) use a numpy array of objects (PyArray_OBJECT), and wrap each «C string» > with a python object, using PyStringObject for example. Then our problem is > that there is as wrapper as data element and I believe data can't be shared > when your created PyStringObject using (char *) thanks to > PyString_AsStringAndSize by example. > > > Now, I will expose a third way, which allow you to use no size-limited strings > (as in solution 1.) and don't create wrappers before you really need it > (on demand/access). > > First, for convenience, we will use in C, (char **) type to build an array of > string pointers (as it was suggested in solution 2). Now, the game is to > make it works with numpy API, and use it in python through a python array. > Basically, I want a very similar behabiour than arrays of PyObject, where > data are not contiguous, only their address are. So, the idea is to create > a new array descr based on PyArray_OBJECT and change its getitem/setitem > functions to deals with my own data. > > I exepected numpy to work with this convenient array descr, but it fails > because PyArray_Scalar (arrayobject.c) don't call descriptor getitem function > (in PyArray_OBJECT case) but call 2 lines which have been copy/paste from > the OBJECT_getitem function). Here my small patch is : > replace (arrayobject.c:983-984): > Py_INCREF(*((PyObject **)data)); > return *((PyObject **)data); > by : > return descr->f->getitem(data, base); > > I play a lot with my new numpy array after this change and noticed that a lot > of uses works : > This is an interesting solution. I was not considering it, though, and so I'm not surprised you have problems. You can register new types but basing them off of PyArray_OBJECT can be problematic because of the special-casing that is done in several places to manage reference counting. You are supposed to register your own data-types and get your own typenumber. Then you can define all the functions for the entries as you wish. Riding on the back of PyArray_OBJECT may work if you are clever, but it may fail mysteriously as well because of a reference count snafu. Thanks for the tests and bug-reports. I have no problem changing the code as you suggest. -Travis |
From: Matthieu P. <pe...@sh...> - 2006-06-21 16:15:46
|
Le Mardi 20 Juin 2006 11:24, Travis Oliphant a =E9crit=A0: > Matthieu Perrot wrote: > > hi, > > > > I need to handle strings shaped by a numpy array whose data own to a C > > structure. There is several possible answers to this problem : > > 1) use a numpy array of strings (PyArray_STRING) and so a (char *) > > object in C. It works as is, but you need to define a maximum size to > > your strings because your set of strings is contiguous in memory. > > 2) use a numpy array of objects (PyArray_OBJECT), and wrap each =ABC > > string=BB with a python object, using PyStringObject for example. Then = our > > problem is that there is as wrapper as data element and I believe data > > can't be shared when your created PyStringObject using (char *) thanks = to > > PyString_AsStringAndSize by example. > > > > > > Now, I will expose a third way, which allow you to use no size-limited > > strings (as in solution 1.) and don't create wrappers before you really > > need it (on demand/access). > > > > First, for convenience, we will use in C, (char **) type to build an > > array of string pointers (as it was suggested in solution 2). Now, the > > game is to make it works with numpy API, and use it in python through a > > python array. Basically, I want a very similar behabiour than arrays of > > PyObject, where data are not contiguous, only their address are. So, the > > idea is to create a new array descr based on PyArray_OBJECT and change > > its getitem/setitem functions to deals with my own data. > > > > I exepected numpy to work with this convenient array descr, but it fails > > because PyArray_Scalar (arrayobject.c) don't call descriptor getitem > > function (in PyArray_OBJECT case) but call 2 lines which have been > > copy/paste from the OBJECT_getitem function). Here my small patch is : > > replace (arrayobject.c:983-984): > > Py_INCREF(*((PyObject **)data)); > > return *((PyObject **)data); > > by : > > return descr->f->getitem(data, base); > > > > I play a lot with my new numpy array after this change and noticed that= a > > lot of uses works : > > This is an interesting solution. I was not considering it, though, and > so I'm not surprised you have problems. You can register new types but > basing them off of PyArray_OBJECT can be problematic because of the > special-casing that is done in several places to manage reference countin= g. > > You are supposed to register your own data-types and get your own > typenumber. Then you can define all the functions for the entries as > you wish. > > Riding on the back of PyArray_OBJECT may work if you are clever, but it > may fail mysteriously as well because of a reference count snafu. > > Thanks for the tests and bug-reports. I have no problem changing the > code as you suggest. > > -Travis Thanks for applying my suggestions. I think, you suggest this kind of declaration : PyArray_Descr *descr =3D PyArray_DescrNewFromType(PyArray_VOID); descr->f->getitem =3D (PyArray_GetItemFunc *) my_getitem; descr->f->setitem =3D (PyArray_SetItemFunc *) my_setitem; descr->elsize =3D sizeof(char *); PyArray_RegisterDataType(descr); Without the last line, you are right it works and it follows the C-API way. But if I register this array descr, the typenumber is bigger than what PyTypeNum_ISFLEXIBLE function considers to be a flexible type. So the returned scalar object is badly-formed. Then, I get a segmentation fault=20 later, because the created voidscalar has a null descr pointer. =2D-=20 Matthieu Perrot |