[cx-oracle-users] Using cx_Oracle with unicode
Brought to you by:
atuining
From: <wa...@li...> - 2004-05-26 19:53:27
|
Hello list! I'm trying to get Unicode data out of Oracle 9.2.0.1.0 with cx_Oracle 4.0.1 and so far didn't have any luck. The database character set of the Oracle instance is WE8ISO8859P1 and the national character set is AL16UTF16. The environment variable NLS_LANG is set to GERMAN_GERMANY.WE8ISO8859P1. All Unicode data is in NVARCHAR2 columns. Doing a select and fetching the data gives me something like this: >>> import cx_Oracle >>> db = cx_Oracle.connect("...") >>> c = db.cursor() >>> c.execute("select m_name from ma_lang where spr_id=40") [<StringVar object at 0x4019e1d0>] >>> d = c.fetchone() >>> d ('\xbf\xbf\xbf\xbf\xbf ... >>> That's something that I expected, because the text is in russian and Oracle seems to use a strange replacement character (U+00BF 'INVERTED QUESTION MARK') for characters that are not transcodable. The strange thing is that setting NLS_LANG to GERMAN_GERMANY.UTF8 doesn't fix the problem. With this setting the result is ('\xc2\xbf\xc2\xbf\xc2\xbf\xc2\xbf ... (i.e. UTF-8 encoded U+00BF) The data in the database is definitely correct, because using Oracles dump() function reveals bytes that look like UTF-16-BE encoded russian text: >>> c.execute("select dump(m_name, 1010) from ma_lang where spr_id=40") [<StringVar object at 0x4019e1d0>] >>> d = c.fetchone() >>> d ('Typ=1 Len=94 CharacterSet=AL16UTF16: 4,26,4,62,4,61,4,50,4,53 ... We're using a Web application written in Java/Struts for entering the text and the text is displaced correctly there too. Unfortunately there's no source for the Oracle/Java driver. Using unistr() seems to work a little better: >>> c.execute("select unistr(m_name) from ma_lang where spr_id=40") [<StringVar object at 0x4019e1d0>] >>> d = c.fetchone() >>> d ('\x04\x1a\x04>\x04=\x042\x045\x049\x045 ... but this break down as soon as there's a backslash in the string. Googling for cx_Oracle and unicode didn't reveal any answers either (just more questions or unrelated stuff) Does anybody know how to get Unicode out of NVARCHAR2 columns with cx_Oracle? Bye, Walter Dörwald |