Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?
Brought to you by:
atuining
From: Amaury F. d'A. <ama...@gm...> - 2007-11-28 17:08:40
|
Michael Schlenker wrote: > Amaury Forgeot d'Arc schrieb: > > matilda matilda wrote: > >>>>> Michael Schlenker <ms...@co...> 28.11.2007 16:31 >>> > >>> Michael Schlenker schrieb: > >>> Okay, i got my test to work after patching cx_Oracle a little bit. > >> Anthony will be happy to hear that. ;-) Anthony: Are you still here? > >> > >>> From taking a closer look at the code Unicode support is at best to be described as > >>> 'rudimentary', lots of fine points still missing in there. > >> I'm sure Anthony will agree. Especially with the upcoming Py3000 there will > >> be many questions to answer regarding byte-strams, unicode-streams, characterset > >> conversion (implicit/explicit), character representation. > >> > >> See the change history to see when Anthony started to focus on character set > >> conversion. > >> > >> Amaury Forgeot d'Arc who also gives valueable input is probably also interested > >> in that topic while speeking and writing a language with many special characters. > > > > I indeed proposed a patch one year ago, to support unicode. > > It was against version 4.2.1, I join it again in the hope it can be useful. > > Looks good, the minimal stuff i did goes a similar way but I didn't use UTF16 yet, > so there might be buffers with problems due to UTF-8 variable length... > > I'll try to use your stuff with a recent cx_Oracle if i find the time. > > And yes, it will break with UCS-4 builds of Python..., easy to fix though, if > one uses AL32UTF-8 instead of the UTF16 code and converts on read. Makes the code > immune against possible BigEndian vs LittleEndian problems too (although i assume > those are handled by OCI for UTF-16 anyway.) But surrogates and the astral plane > is a treacherous ground anyway, so if BMP works for a start its nice. I'm not sure to understand everything here, but it seemed to me that the correct way was to set the charsetId to OCI_UTF16ID, because it is completely independent of any NLS settings (there is no other possible value, btw). The values are expressed in a unicode-capable encoding, and this is enough. To comply to UCS-4 builds, it should be enough to properly use functions like PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16 in StringVar_SetValue, instead of the plain memcpy. Sorry I don't have the time at the moment, but I am sure you can do something with it. -- Amaury Forgeot d'Arc |