[cx-oracle-users] Unicode problems with Python 3 and Cx_Oracle 5.1.2
Brought to you by:
atuining
|
From: <joa...@ot...> - 2014-04-08 17:27:19
|
Hi, I'm having problems in getting cx_oracle working with Unicode using python 3 The setup: Python 3.3.2 Oracle Server: 11g Client library: 11_2_03 NLS_LANG: American_America.UTF8 NLS_CHARACTERSET: WE8MSWIN1252 NLS_NCHAR_CHARACTERSET: AL16UTF16 The db table contains both varchar2 and nvarchar2 columns. I can use SqlPlus to read and write Unicode strings in the nvarchar2 columns without any problems, however in cx_oracle, the same statements will result in a conversion loss and the "to big" chars will be replaced with ¿ both in the read and inserted data. Since things works with sqlplus I would assume that the server/client lib side is fine but there is something I missed in how I use cx_oracle. Goggling on the issue, I found this in the oracle unicode guidelines: http://docs.oracle.com/cd/B28359_01/server.111/b28298/ch7progrunicode.htm#i1006452. "When you bind or define SQL NCHAR datatypes and do not set OCI_ATTR_CHARSET_FORM, data conversions take place from client character set to the database character set, and from the database character set to the national database character set. In the worst case, data loss can occur if the database character set is smaller than the client's." This seems to describe the problem that I see spot on, i.e that the string passes through the database charset (WE8MSWIN1252) when doing the conversion between client UTF8 and server UTF16. In the python 2.x code path it looks like OCI_ATTR_CHARSET_FORM is explicitly set depending on the database column type but I can't find any similar code for the python3 code path. Doing a quick hack and actually set this attribute for strings seem to indicate that this is a problem and the values shows up nicely when making queries. So is this setup not supported, anything I missed and are there any way to get around this when the database charset is not Unicode? Thanks Joakim |