Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?
Brought to you by:
atuining
From: Michael S. <ms...@co...> - 2007-11-28 16:46:28
|
Amaury Forgeot d'Arc schrieb: > matilda matilda wrote: >>>>> Michael Schlenker <ms...@co...> 28.11.2007 16:31 >>> >>> Michael Schlenker schrieb: >>> Okay, i got my test to work after patching cx_Oracle a little bit. >> Anthony will be happy to hear that. ;-) Anthony: Are you still here? >> >>> From taking a closer look at the code Unicode support is at best to be described as >>> 'rudimentary', lots of fine points still missing in there. >> I'm sure Anthony will agree. Especially with the upcoming Py3000 there will >> be many questions to answer regarding byte-strams, unicode-streams, characterset >> conversion (implicit/explicit), character representation. >> >> See the change history to see when Anthony started to focus on character set >> conversion. >> >> Amaury Forgeot d'Arc who also gives valueable input is probably also interested >> in that topic while speeking and writing a language with many special characters. > > I indeed proposed a patch one year ago, to support unicode. > It was against version 4.2.1, I join it again in the hope it can be useful. Looks good, the minimal stuff i did goes a similar way but I didn't use UTF16 yet, so there might be buffers with problems due to UTF-8 variable length... I'll try to use your stuff with a recent cx_Oracle if i find the time. And yes, it will break with UCS-4 builds of Python..., easy to fix though, if one uses AL32UTF-8 instead of the UTF16 code and converts on read. Makes the code immune against possible BigEndian vs LittleEndian problems too (although i assume those are handled by OCI for UTF-16 anyway.) But surrogates and the astral plane is a treacherous ground anyway, so if BMP works for a start its nice. Michael |