|
From: Tim H. <tim...@co...> - 2006-02-07 18:33:15
|
Eric Firing wrote: > Francesc, Travis, > > Francesc Altet wrote: > [...] > >> All in all, my opinion is that allowing the coexistence of different >> sizes of unicode types in numpy would be a receipt for disaster when >> one wants to transport unicode characters between platforms with >> python interpreters compiled with different unicode sizes. > > > I agree--it would be a nightmare. > > >> Anyway, I don't know if the recommendation of compiling Python with >> UCS4 is spread enough or not in the different distributions, but >> people can easily check this with: >> >> >>>>> len(buffer(u"u")) >>>> >> >> 4 >> >> if the output of this is 4 (as in my example), then the interpreter is >> using UCS4; if it is 2, it is using UCS2. > > > No, it is not sufficiently widespread; Mandriva 2006 python is > compiled for UCS2. Also the default build for MS Windows is compiled for UCS2. How about always storing data as UCS4 and converting it on the fly to UCS2 when extracting a python string from the array, if on a UCS2 python build. Isn't converting to UCS2 simply a matter of lopping off the top two bytes? If so, converting it should be simply a check that the value is not out of range, followed by the aforementioned lopping. -tim |