|
From: Travis O. <oli...@ie...> - 2006-02-08 08:41:26
|
Francesc Altet wrote: >Ok. I see that you got my point. Well, maybe I'm wrong here, but my >proposal would result in implementing just one new data-type for 32-bit >unicode when the python platform is UCS2 aware. If, as you said above, >Py_UCS4 type is always defined, even on UCS2 interpreters, that should >be relatively easy to do. > Hmm. I think I'm beginning to like your idea. We could in fact make the NumPy Unicode type always UCS4 and then keep the Python Unicode scalar. On Python UCS2 builds the conversion would use UTF-16 to go to the Python scalar (which would always inherit from the native unicode type). It would be one data-type where there was not an identical match in the memory layout of the scalar and the array data-type, but because in this case there are conversions to go back and forth, it may not matter. This would not be too difficult to implement, actually --- it would require new functions to handle conversions in arraytypes.inc.src and some modifications to PyArray_Scalar. The only draw-back is that now all unicode arrays are twice as large and the aforementioned asymmetry between the data-type and the array-scalar on Python UCS2 builds. But, all in all, it sounds like a good plan. If the time comes that somebody wants to add a reduced-size USC2 array of unicode characters then we can cross that bridge if and when it comes up. I still like using explicit typecode characters in the array interface to denote UCS2 or the UCS4 data-type. We could still change from 'W', 'w' to other characters... >Well, probably I've overlooked something, but I really think that this >would be a nice thing to do. > > There are details in the scalar-array conversions (getitem and setitem that would have to be implemented but it is possible. The UCS4 --> UTF-16 encoding is one of the easiest. It's done in unicodeobject.h in Python, but I'm not sure it's exposed other than going through the interpreter. Does this seem like a solution that everyone can live with? -Travis |