Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Michael Schlenker wrote:
> Amaury Forgeot d'Arc schrieb:
> > matilda matilda wrote:
> >>>>> Michael Schlenker <ms...@co...> 28.11.2007 16:31 >>>
> >>> Michael Schlenker schrieb:
> >>> Okay, i got my test to work after patching cx_Oracle a little bit.
> >> Anthony will be happy to hear that.  ;-) Anthony: Are you still here?
> >>
> >>> From taking a closer look at the code Unicode support is at best to be described as
> >>> 'rudimentary', lots of fine points still missing in there.
> >> I'm sure Anthony will agree. Especially with the upcoming Py3000 there will
> >> be many questions to answer regarding byte-strams, unicode-streams, characterset
> >> conversion (implicit/explicit), character representation.
> >>
> >> See the change history to see when Anthony started to focus on character set
> >> conversion.
> >>
> >> Amaury Forgeot d'Arc who also gives valueable input is probably also interested
> >> in that topic while speeking and writing a language with many special characters.
> >
> > I indeed proposed a patch one year ago, to support unicode.
> > It was against version 4.2.1, I join it again in the hope it can be useful.
>
> Looks good, the minimal stuff i did goes a similar way but I didn't use UTF16 yet,
> so there might be buffers with problems due to UTF-8 variable length...
>
> I'll try to use your stuff with a recent cx_Oracle if i find the time.
>
> And yes, it will break with UCS-4 builds of Python..., easy to fix though, if
> one uses AL32UTF-8 instead of the UTF16 code and converts on read. Makes the code
> immune against possible BigEndian vs LittleEndian problems too (although i assume
> those are handled by OCI for UTF-16 anyway.) But surrogates and the astral plane
> is a treacherous ground anyway, so if BMP works for a start its nice.

I'm not sure to understand everything here, but it seemed to me that
the correct way was to set the charsetId to OCI_UTF16ID, because it is
completely independent of any NLS settings (there is no other possible
value, btw). The values are expressed in a unicode-capable encoding,
and this is enough.

To comply to UCS-4 builds, it should be enough to properly use
functions like PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16 in
StringVar_SetValue, instead of the plain memcpy.
Sorry I don't have the time at the moment, but I am sure you can do
something with it.

-- 
Amaury Forgeot d'Arc