Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?
Brought to you by:
atuining
From: Michael S. <ms...@co...> - 2007-11-28 17:20:54
|
Amaury Forgeot d'Arc schrieb: > Michael Schlenker wrote: >> Amaury Forgeot d'Arc schrieb: >>> matilda matilda wrote: >>>>>>> Michael Schlenker <ms...@co...> 28.11.2007 16:31 >>> >>>>> Michael Schlenker schrieb: >>>>> Okay, i got my test to work after patching cx_Oracle a little bit. >>>> Anthony will be happy to hear that. ;-) Anthony: Are you still here= ? >>>> >>>>> From taking a closer look at the code Unicode support is at best to= be described as >>>>> 'rudimentary', lots of fine points still missing in there. >>>> I'm sure Anthony will agree. Especially with the upcoming Py3000 the= re will >>>> be many questions to answer regarding byte-strams, unicode-streams, = characterset >>>> conversion (implicit/explicit), character representation. >>>> >>>> See the change history to see when Anthony started to focus on chara= cter set >>>> conversion. >>>> >>>> Amaury Forgeot d'Arc who also gives valueable input is probably also= interested >>>> in that topic while speeking and writing a language with many specia= l characters. >>> I indeed proposed a patch one year ago, to support unicode. >>> It was against version 4.2.1, I join it again in the hope it can be u= seful. >> Looks good, the minimal stuff i did goes a similar way but I didn't us= e UTF16 yet, >> so there might be buffers with problems due to UTF-8 variable length..= =2E >> >> I'll try to use your stuff with a recent cx_Oracle if i find the time.= >> >> And yes, it will break with UCS-4 builds of Python..., easy to fix tho= ugh, if >> one uses AL32UTF-8 instead of the UTF16 code and converts on read. Mak= es the code >> immune against possible BigEndian vs LittleEndian problems too (althou= gh i assume >> those are handled by OCI for UTF-16 anyway.) But surrogates and the as= tral plane >> is a treacherous ground anyway, so if BMP works for a start its nice. >=20 > I'm not sure to understand everything here, but it seemed to me that > the correct way was to set the charsetId to OCI_UTF16ID, because it is > completely independent of any NLS settings (there is no other possible > value, btw).=20 I don't think so. The comments at least tell me that i can feed in all valid oracle charsets (probably after converting string to id via ub2 OCINlsCharSetNameToId(dvoid *envhp, const oratext *name); ) and that OCI_UTF16ID is just the only one that cannot be specified for NLS_LANG. So you can choose the one you want and let OCI deal with the conversion, OCI_UTF16ID is just the one used in the OCI docs because its very convenient when working with windows wchar_t. So its probably the recommended value for charsetId but not the only possible one. >The values are expressed in a unicode-capable encoding, > and this is enough. Yes. >=20 > To comply to UCS-4 builds, it should be enough to properly use > functions like PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16 in > StringVar_SetValue, instead of the plain memcpy. Yes. > Sorry I don't have the time at the moment, but I am sure you can do > something with it. If i find the time i will yes. Thanks again for the patch. Michael --=20 Michael Schlenker Software Engineer CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Stra=DFe 1-3 Fax: +49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: ms...@co... Sitz der Gesellschaft: Bremen | Gesch=E4ftsf=FChrer: Karl Heinz Zachries Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215 |