Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Amaury Forgeot d'Arc schrieb:
> Michael Schlenker wrote:
>> Amaury Forgeot d'Arc schrieb:
>>> matilda matilda wrote:
>>>>>>> Michael Schlenker <ms...@co...> 28.11.2007 16:31 >>>
>>>>> Michael Schlenker schrieb:
>>>>> Okay, i got my test to work after patching cx_Oracle a little bit.
>>>> Anthony will be happy to hear that.  ;-) Anthony: Are you still here=
?
>>>>
>>>>> From taking a closer look at the code Unicode support is at best to=
 be described as
>>>>> 'rudimentary', lots of fine points still missing in there.
>>>> I'm sure Anthony will agree. Especially with the upcoming Py3000 the=
re will
>>>> be many questions to answer regarding byte-strams, unicode-streams, =
characterset
>>>> conversion (implicit/explicit), character representation.
>>>>
>>>> See the change history to see when Anthony started to focus on chara=
cter set
>>>> conversion.
>>>>
>>>> Amaury Forgeot d'Arc who also gives valueable input is probably also=
 interested
>>>> in that topic while speeking and writing a language with many specia=
l characters.
>>> I indeed proposed a patch one year ago, to support unicode.
>>> It was against version 4.2.1, I join it again in the hope it can be u=
seful.
>> Looks good, the minimal stuff i did goes a similar way but I didn't us=
e UTF16 yet,
>> so there might be buffers with problems due to UTF-8 variable length..=
=2E
>>
>> I'll try to use your stuff with a recent cx_Oracle if i find the time.=

>>
>> And yes, it will break with UCS-4 builds of Python..., easy to fix tho=
ugh, if
>> one uses AL32UTF-8 instead of the UTF16 code and converts on read. Mak=
es the code
>> immune against possible BigEndian vs LittleEndian problems too (althou=
gh i assume
>> those are handled by OCI for UTF-16 anyway.) But surrogates and the as=
tral plane
>> is a treacherous ground anyway, so if BMP works for a start its nice.
>=20
> I'm not sure to understand everything here, but it seemed to me that
> the correct way was to set the charsetId to OCI_UTF16ID, because it is
> completely independent of any NLS settings (there is no other possible
> value, btw).=20
I don't think so. The comments at least tell me that i can feed in
all valid oracle charsets (probably after converting string to id via
ub2 OCINlsCharSetNameToId(dvoid *envhp, const oratext *name);
) and that OCI_UTF16ID is just the only one that cannot be specified for
NLS_LANG. So you can choose the one you want and let OCI deal with the
conversion, OCI_UTF16ID is just the one used in the OCI docs because its
very convenient when working with windows wchar_t. So its probably the
recommended value for charsetId but not the only possible one.

>The values are expressed in a unicode-capable encoding,
> and this is enough.
Yes.
>=20
> To comply to UCS-4 builds, it should be enough to properly use
> functions like PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16 in
> StringVar_SetValue, instead of the plain memcpy.
Yes.

> Sorry I don't have the time at the moment, but I am sure you can do
> something with it.
If i find the time i will yes. Thanks again for the patch.

Michael

--=20
Michael Schlenker
Software Engineer

CONTACT Software GmbH           Tel.:   +49 (421) 20153-80
Wiener Stra=DFe 1-3               Fax:    +49 (421) 20153-41
28359 Bremen
http://www.contact.de/          E-Mail: ms...@co...

Sitz der Gesellschaft: Bremen | Gesch=E4ftsf=FChrer: Karl Heinz Zachries
Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215