[cx-oracle-users] Issue with UTF-8 encoding

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I am having an encoding issue using Python 2.6.2, cxOracle 5.0.4 to
access an Oracle 11g database.

I am using the NLS_LANG=.AL32UTF8 environment variable. My table in
Oracle is correctly configured to accept Unicode.

I compiled cxOracle without the WITH_UNICODE flag and passed unicode()
objects to cxOracle. Everything worked without exceptions or warnings.
However, sometimes Oracle would complain that the string I was trying
to insert into a VARCHAR2 field was too big (> 4000), even when the
string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes.
I used a sniffer to verify that the Oracle client was sending two
bytes for each character (even the ASCII ones), instead of sending two
bytes only for special characters.

It seemed to me that cx_Oracle accepts unicode() objects but it does
not encode() them to the correct encoding (as set in NLS_LANG
variable) if the WITH_UNICODE flag is unset. Instead, it just sends to
Oracle the internal representation of the unicode() object.

Is this behaviour expected? Am I doing something wrong?

Regards,

Guilherme.