Hello list!
I'm trying to get Unicode data out of Oracle 9.2.0.1.0
with cx_Oracle 4.0.1 and so far didn't have any luck.
The database character set of the Oracle instance is WE8ISO8859P1
and the national character set is AL16UTF16. The environment
variable NLS_LANG is set to GERMAN_GERMANY.WE8ISO8859P1.
All Unicode data is in NVARCHAR2 columns.
Doing a select and fetching the data gives me something
like this:
>>> import cx_Oracle
>>> db = cx_Oracle.connect("...")
>>> c = db.cursor()
>>> c.execute("select m_name from ma_lang where spr_id=40")
[<StringVar object at 0x4019e1d0>]
>>> d = c.fetchone()
>>> d
('\xbf\xbf\xbf\xbf\xbf ...
>>>
That's something that I expected, because the text is in russian and
Oracle seems to use a strange replacement character (U+00BF
'INVERTED QUESTION MARK') for characters that are not transcodable.
The strange thing is that setting NLS_LANG to GERMAN_GERMANY.UTF8
doesn't fix the problem. With this setting the result is
('\xc2\xbf\xc2\xbf\xc2\xbf\xc2\xbf ...
(i.e. UTF-8 encoded U+00BF)
The data in the database is definitely correct, because using Oracles
dump() function reveals bytes that look like UTF-16-BE encoded russian
text:
>>> c.execute("select dump(m_name, 1010) from ma_lang where spr_id=40")
[<StringVar object at 0x4019e1d0>]
>>> d = c.fetchone()
>>> d
('Typ=1 Len=94 CharacterSet=AL16UTF16: 4,26,4,62,4,61,4,50,4,53 ...
We're using a Web application written in Java/Struts for entering the
text and the text is displaced correctly there too. Unfortunately
there's no source for the Oracle/Java driver.
Using unistr() seems to work a little better:
>>> c.execute("select unistr(m_name) from ma_lang where spr_id=40")
[<StringVar object at 0x4019e1d0>]
>>> d = c.fetchone()
>>> d
('\x04\x1a\x04>\x04=\x042\x045\x049\x045 ...
but this break down as soon as there's a backslash in the string.
Googling for cx_Oracle and unicode didn't reveal any answers either
(just more questions or unrelated stuff)
Does anybody know how to get Unicode out of NVARCHAR2 columns with
cx_Oracle?
Bye,
Walter Dörwald
|