[cx-oracle-users] Bad conversion of a unicode value?
Brought to you by:
atuining
From: Michael S. <ms...@co...> - 2007-11-26 17:44:08
|
Hi, I'm trying to use unicode with cx_Oracle but get some strange results, ma= ybe someone can enlighten me to what goes wrong. The setup: cx_Oracle.version =3D 4.3.3 OCI from instantclient10_1 (oci.dll Version 10.01.0000.0004) NLS_LANG set to GERMAN_GERMANY.AL32UTF8 before loading cx_Oracle Windows XP SP 2 from sqlplus: SQL> select * from nls_database_parameters; PARAMETER VALUE ------------------------------ ---------------------------------------- NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_CHARACTERSET WE8MSWIN1252 NLS_CALENDAR GREGORIAN NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM PARAMETER VALUE ------------------------------ ---------------------------------------- NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR NLS_DUAL_CURRENCY $ NLS_COMP BINARY NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CONV_EXCP FALSE NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_RDBMS_VERSION 10.2.0.3.0 I created a Table with sqlplus with some unicode characters in it: CREATE TABLE nls_testing ( unicp NVARCHAR(10)); INSERT INTO nls_testing (nchar(8365)); -- euro symbol uses 2 utf-8 bytes INSERT INTO nls_testing (nchar(9305)); -- some cjk idogram uses 3 utf-8 b= ytes COMMIT; A select shows all is fine in the db, i see the correct Unicode codepoint= s. SELECT asciistr(unicp) FROM nls_testing; ASCIISTR(UNICP) -------------------------------------------------------------------------= ------- \20AC \2456 Now i tried to select the data with cx_Oracle from a python shell (2.5.1,= windows): >>> import cx_Oracle >>> cx_Oracle.version '4.3.3' >>> conn =3D cx_Oracle.connect('...') >>> cur =3D conn.cursor() >>> conn.encoding 'UTF-8' >>> conn.nencoding 'UTF-8' >>> cur.execute('select unicp, asciistr(unicp) from nls_testing') [<cx_Oracle.STRING with value None>, <cx_Oracle.STRING with value None>] >>> rows =3D cur.fetchall() >>> print rows [('\xe2\x82\xac', '\\20AC'), ('\xc2\xbf', '\\2456')] >>> for row in rows: =2E.. print repr(unicode(row[0].decode('utf-8'))) =2E.. u'\u20ac' u'\xbf' So i get the euro symbol correctly delivered and decoded from utf-8, but = i get trash from the chinese character. Expected utf-8 hex value for unicode co= depoint 0x2456 is 'e29196'. Any idea what goes wrong here? Michael --=20 Michael Schlenker Software Engineer CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Stra=DFe 1-3 Fax: +49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: ms...@co... Sitz der Gesellschaft: Bremen | Gesch=E4ftsf=FChrer: Karl Heinz Zachries Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215 |