Re: [cx-oracle-users] Antw: Bad conversion of a unicode value?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Anthony Tuininga schrieb:
> Yes, I just looked through my e-mail and noticed this one. Some of the
> patch has already been applied but the addition of a new type that
> returns and accepts unicode I have not yet done. Recently I have been
> looking into this again (in conjunction with the ceODBC module) and
> ran into a few difficulties since most Unix builds are wide unicode
> and not narrow. Windows, of course, is different in this respect. Your
> code assumes a narrow build so I'd have to come up with a solution for
> that. I believe somewhere in this thread a solution to that problem
> has been given but I'll have to see if it really solves the problem or
> not. My apologies for dropping this one for so long. You should be
> more persistent and send it to me every few months. :-) If you don't,
> unless the patch is simple and obvious or I actually need it, it tends
> to get lost. And since Unicode is tough, I've also tended to avoid it
> somewhat. :-( Hopefully I'll be able to get some time to work on this
> in the next few weeks. Bug me again in January if you haven't heard
> anything about it before then.
>=20

Amaury's patch is better than mine, so forget my patch... But its still
not complete. Hope i can find time to test and fix the issues, and need
to do it on a bunch of boxes (HPUX, AIX, Windows, Solaris, Linux etc., so=

I might have to fix that wide vs. narrow issue anyway).

Using OCINlsCreateEnv() might be better for Python uses, because you can
then handle encodings much more graceful and isolate yourself from broken=

NLS_LANG settings on the client side. But it surely depends on how twiste=
d
the usual python usage is, and from what i have seen up to now there is q=
uite
some insane messing with encodings in standard strings in typical multi-e=
ncoding code,
including the popular but dangerous changing of the sys.getdefaultencodin=
g() to something
nicer than ASCII.

AFAIK Unicode with Oracle is rather easy if you have a database encoding
of Unicode (AL32UTF8), because then all conversions done by OCI are essen=
tially
lossless. BUT if you have a database encoding like CP1252 or ISO8859-1 th=
e full
nastyness of Oracle hits you, as all statements are converted via the dat=
abase
encoding (which is sometimes a subset of the national encoding which is u=
nicode nowadays...),
so all statements and results are stripped from unicode chars while passi=
ng through OCI
unless specially protected or quoted.
For new databases choosing anything but a unicode database encoding is mo=
stly foolish,
because life gets hard, but for legacy databases you often don't have a c=
hoice (unless
your legacy is 7-bit ascii, where you can migrate to UTF-8 easily because=
 its a strict
superset of ASCII.).

Michael

--=20
Michael Schlenker
Software Engineer

CONTACT Software GmbH           Tel.:   +49 (421) 20153-80
Wiener Stra=DFe 1-3               Fax:    +49 (421) 20153-41
28359 Bremen
http://www.contact.de/          E-Mail: ms...@co...

Sitz der Gesellschaft: Bremen | Gesch=E4ftsf=FChrer: Karl Heinz Zachries
Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215