Thread: [cx-oracle-users] Bad conversion of a unicode value?

Brought to you by: atuining

cx-oracle-users

[cx-oracle-users] Bad conversion of a unicode value?

From: Michael S. <ms...@co...> - 2007-11-26 17:44:08

Hi,

I'm trying to use unicode with cx_Oracle but get some strange results, ma=
ybe someone can
enlighten me to what goes wrong.

The setup:
cx_Oracle.version =3D 4.3.3
OCI from instantclient10_1 (oci.dll Version 10.01.0000.0004)
NLS_LANG set to GERMAN_GERMANY.AL32UTF8 before loading cx_Oracle
Windows XP SP 2

from sqlplus:
SQL> select * from nls_database_parameters;

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8MSWIN1252
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_RDBMS_VERSION              10.2.0.3.0

I created a Table with sqlplus with some unicode characters in it:

CREATE TABLE nls_testing ( unicp NVARCHAR(10));
INSERT INTO nls_testing (nchar(8365)); -- euro symbol uses 2 utf-8 bytes
INSERT INTO nls_testing (nchar(9305)); -- some cjk idogram uses 3 utf-8 b=
ytes
COMMIT;

A select shows all is fine in the db, i see the correct Unicode codepoint=
s.

SELECT asciistr(unicp) FROM nls_testing;

ASCIISTR(UNICP)
-------------------------------------------------------------------------=
-------
\20AC
\2456

Now i tried to select the data with cx_Oracle from a python shell (2.5.1,=
 windows):

 >>> import cx_Oracle
 >>> cx_Oracle.version
'4.3.3'
 >>> conn =3D cx_Oracle.connect('...')
 >>> cur =3D conn.cursor()
 >>> conn.encoding
'UTF-8'
 >>> conn.nencoding
'UTF-8'

 >>> cur.execute('select unicp, asciistr(unicp) from nls_testing')
[<cx_Oracle.STRING with value None>, <cx_Oracle.STRING with value None>]
 >>> rows =3D cur.fetchall()
 >>> print rows
[('\xe2\x82\xac', '\\20AC'), ('\xc2\xbf', '\\2456')]
 >>> for row in rows:
=2E..     print repr(unicode(row[0].decode('utf-8')))
=2E..
u'\u20ac'
u'\xbf'

So i get the euro symbol correctly delivered and decoded from utf-8, but =
i get
trash from the chinese character. Expected utf-8 hex value for unicode co=
depoint 0x2456 is 'e29196'.

Any idea what goes wrong here?

Michael


--=20
Michael Schlenker
Software Engineer

CONTACT Software GmbH           Tel.:   +49 (421) 20153-80
Wiener Stra=DFe 1-3               Fax:    +49 (421) 20153-41
28359 Bremen
http://www.contact.de/          E-Mail: ms...@co...

Sitz der Gesellschaft: Bremen | Gesch=E4ftsf=FChrer: Karl Heinz Zachries
Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215

[cx-oracle-users] Antw: Bad conversion of a unicode value?

From: matilda m. <ma...@gr...> - 2007-11-27 08:28:19

>>> Michael Schlenker <ms...@co...> 26.11.2007 18:24 >>>
>
>I created a Table with sqlplus with some unicode characters in it:
>
>CREATE TABLE nls_testing ( unicp NVARCHAR(10));
>INSERT INTO nls_testing (nchar(8365)); -- euro symbol uses 2 utf-8 bytes
>INSERT INTO nls_testing (nchar(9305)); -- some cjk idogram uses 3 utf-8 =
bytes
>COMMIT;

Hi Michael,

I just wanted to reproduce your problem, but:
1) CREATE TABLE nls_testing ( unicp NVARCHAR(10));
doesn't work.

2) a) INSERT INTO nls_testing (nchar(8365))
   b) INSERT INTO nls_testing (nchar(9305))
are syntactically wrong.

Decimal 8365 =3D Hex 20AD  and NOT 20AC
Decimal 9305 =3D Hex 2459  and NOT 2456


Can you check this, so that we can have a look at this?

Best regards
Andreas Mock