Re: [cx-oracle-users] Problems with UTF8 and unicode version of cx_Oracle
Brought to you by:
atuining
From: Amaury F. d'A. <ama...@gm...> - 2010-07-07 13:57:06
|
Hi, 2010/7/7 Stefan Dietrich <ste...@de...>: > Hello, > > I´m using the cx_Oracle module for importing data from another database > (ZODB), but I have problems with the character encoding. > The data in ZODB is entered through a web form, so the character > encoding in the ZODB is UTF-8. > > Everything is fine until special characters appear or characters from > foreign languages, e.g. a dash, cyrillic characters or the German letter > "ß". However, German umlauts (üäö) are not affected. > If I use "print" to display them, everything is fine. > > But if I try to insert them into the Oracle DB, they appear as an > inverted question mark in the Oracle SQL Developer or SQLPlus. > If I query my Oracle DB again with cx_Oracle, they appear as inverted > questionmarks. > > Example (trying to insert data with a dash): > > title = conference.getTitle().decode('utf-8') > print title > 2nd Workshop: GISAXS – an advanced scattering method > title > u'2nd Workshop: GISAXS \u2013 an advanced scattering method' > cursor.execute(u"INSERT INTO conf (CID, title) VALUES (4, '" > +title+ "')") > conn.commit() > cursor.execute(u"SELECT * FROM conf WHERE CID = 4") > fetch = cursor.fetchall() > fetch[0][1] > u'2nd Workshop: GISAXS \xbf an advanced scattering method' > print fetch[0][1] > 2nd Workshop: GISAXS ¿ an advanced scattering method > > > If I copy&paste the output of print with the dash and insert the data > with Oracle SQLDeveloper, everthing is correct. Even cx_Oracle returns > the dash after a select. > > I´m using cx_Oracle 5.0.3 Unicode Version with Python 2.4.3 on > Scientific Linux 5.3. The field is NVARCHAR2 (happens also with > VARCHAR), NLS_LANG is set to GERMAN_GERMANY.UTF8 and > NLS_NCHAR_CHARACTERSET is UTF8. > > I´m out of ideas, so I would appreciate any help and thanks in advance! Even if the column is defined as NVARCHAR2, you pass it in the SQL statement as a string literal. Oracle always considers SQL statements as strings (= VARCHAR), so the special character will be lost if it cannot be encoded in the NLS_CHAR_CHARACTERSET. You should not pass the values as string literals in the query, but use bind variables instead: cursor.execute("INSERT INTO conf (CID, title) VALUES (4, :1)", [title]) Now the query only contains plain ascii characters; the unicode value is passed through a bind variable which will respect all characters. -- Amaury Forgeot d'Arc |