Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?
Brought to you by:
atuining
From: Michael S. <ms...@co...> - 2007-11-30 11:31:45
|
Anthony Tuininga schrieb: > Yes, I just looked through my e-mail and noticed this one. Some of the > patch has already been applied but the addition of a new type that > returns and accepts unicode I have not yet done. Recently I have been > looking into this again (in conjunction with the ceODBC module) and > ran into a few difficulties since most Unix builds are wide unicode > and not narrow. Windows, of course, is different in this respect. Your > code assumes a narrow build so I'd have to come up with a solution for > that. I believe somewhere in this thread a solution to that problem > has been given but I'll have to see if it really solves the problem or > not. My apologies for dropping this one for so long. You should be > more persistent and send it to me every few months. :-) If you don't, > unless the patch is simple and obvious or I actually need it, it tends > to get lost. And since Unicode is tough, I've also tended to avoid it > somewhat. :-( Hopefully I'll be able to get some time to work on this > in the next few weeks. Bug me again in January if you haven't heard > anything about it before then. >=20 Amaury's patch is better than mine, so forget my patch... But its still not complete. Hope i can find time to test and fix the issues, and need to do it on a bunch of boxes (HPUX, AIX, Windows, Solaris, Linux etc., so= I might have to fix that wide vs. narrow issue anyway). Using OCINlsCreateEnv() might be better for Python uses, because you can then handle encodings much more graceful and isolate yourself from broken= NLS_LANG settings on the client side. But it surely depends on how twiste= d the usual python usage is, and from what i have seen up to now there is q= uite some insane messing with encodings in standard strings in typical multi-e= ncoding code, including the popular but dangerous changing of the sys.getdefaultencodin= g() to something nicer than ASCII. AFAIK Unicode with Oracle is rather easy if you have a database encoding of Unicode (AL32UTF8), because then all conversions done by OCI are essen= tially lossless. BUT if you have a database encoding like CP1252 or ISO8859-1 th= e full nastyness of Oracle hits you, as all statements are converted via the dat= abase encoding (which is sometimes a subset of the national encoding which is u= nicode nowadays...), so all statements and results are stripped from unicode chars while passi= ng through OCI unless specially protected or quoted. For new databases choosing anything but a unicode database encoding is mo= stly foolish, because life gets hard, but for legacy databases you often don't have a c= hoice (unless your legacy is 7-bit ascii, where you can migrate to UTF-8 easily because= its a strict superset of ASCII.). Michael --=20 Michael Schlenker Software Engineer CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Stra=DFe 1-3 Fax: +49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: ms...@co... Sitz der Gesellschaft: Bremen | Gesch=E4ftsf=FChrer: Karl Heinz Zachries Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215 |