Thread: [cx-oracle-users] maintaining multiple connections with different encodings

Brought to you by: atuining

cx-oracle-users

[cx-oracle-users] maintaining multiple connections with different encodings

From: Dana P. <da...@gm...> - 2011-05-09 13:00:40

I have a python process which needs to connect to multiple oracle
databases with different NLS_CHARACTERSET values.

I can set the appropriate character set by setting the NLS_LANG
environment variable, either in my shell or using os.environ before
connecting. For utf-8 databases, I set it to "AMERICAN_AMERICA.UTF8",
for Latin-1 (ISO-8859-1) I set it to "AMERICAN_AMERICA.WE8ISO8859P1".

Here's some code that illustrates the problem:

import os
import cx_Oracle

os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8'
utf8_conn = cx_Oracle.connect('dana', 'pw', 'utf8db')
print utf8_conn.encoding
s = utf8_conn.cursor().execute('SELECT :s AS str FROM dual',
s=u'd\u00e4t\u00e4').fetchone()[0]
print s, type(s), len(s), len(s.encode('utf8'))

utf8_conn.close()

os.environ['NLS_LANG'] ='AMERICAN_AMERICA.WE8ISO8859P1'
latin1_conn = cx_Oracle.connect('dana', 'pw', 'latin1db')
print latin1_conn.encoding
s = latin1_conn.cursor().execute('SELECT :s AS str FROM dual',
s='d\xe4t\xe4').fetchone()[0].decode('latin1')
print s, type(s), len(s), len(s.encode('latin1'))

This outputs:
UTF-8
dätä <type 'unicode'> 4 6
ISO-8859-1
dätä <type 'unicode'> 4 4

The problem is, if I don't close the utf8 connection before opening
the latin1 connection, the latin1 connection's encoding will be utf8.
Here's the output with the utf8_conn.close() call removed:

UTF-8
dätä <type 'unicode'> 4 6
UTF-8
dÂ¿ <type 'unicode'> 3 3

Is there any way around this? I'd like to be able to open all my
database connections at startup time instead of constantly
opening/closing them.

I'm using cx_Oracle 5.1 with oracle client version 10.2.0.1 on SLES
10. Both databases that I'm connecting to are 10.2.0.5.

Thanks,

Dana P

Re: [cx-oracle-users] maintaining multiple connections with different encodings

From: Anthony T. <ant...@gm...> - 2011-05-09 16:14:44

Hi,

First, Oracle does not allow redefining the default encoding with
multiple connections. So the behavior you are seeing is expected.

Second, why do you want to use different encodings? Oracle will
automatically translate between whatever encoding is used in the
database and whatever encoding you are using on the client anyway. So
you can feel quite free to use utf-8 for both databases and everything
will work just fine since utf-8 is capable of displaying all of the
characters in the latin-1 character set.

Hope that helps.

Anthony

On Mon, May 9, 2011 at 7:00 AM, Dana Pieluszczak <da...@gm...> wrote:
> I have a python process which needs to connect to multiple oracle
> databases with different NLS_CHARACTERSET values.
>
> I can set the appropriate character set by setting the NLS_LANG
> environment variable, either in my shell or using os.environ before
> connecting. For utf-8 databases, I set it to "AMERICAN_AMERICA.UTF8",
> for Latin-1 (ISO-8859-1) I set it to "AMERICAN_AMERICA.WE8ISO8859P1".
>
> Here's some code that illustrates the problem:
>
> import os
> import cx_Oracle
>
> os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8'
> utf8_conn = cx_Oracle.connect('dana', 'pw', 'utf8db')
> print utf8_conn.encoding
> s = utf8_conn.cursor().execute('SELECT :s AS str FROM dual',
> s=u'd\u00e4t\u00e4').fetchone()[0]
> print s, type(s), len(s), len(s.encode('utf8'))
>
> utf8_conn.close()
>
> os.environ['NLS_LANG'] ='AMERICAN_AMERICA.WE8ISO8859P1'
> latin1_conn = cx_Oracle.connect('dana', 'pw', 'latin1db')
> print latin1_conn.encoding
> s = latin1_conn.cursor().execute('SELECT :s AS str FROM dual',
> s='d\xe4t\xe4').fetchone()[0].decode('latin1')
> print s, type(s), len(s), len(s.encode('latin1'))
>
> This outputs:
> UTF-8
> dätä <type 'unicode'> 4 6
> ISO-8859-1
> dätä <type 'unicode'> 4 4
>
> The problem is, if I don't close the utf8 connection before opening
> the latin1 connection, the latin1 connection's encoding will be utf8.
> Here's the output with the utf8_conn.close() call removed:
>
> UTF-8
> dätä <type 'unicode'> 4 6
> UTF-8
> dÂ¿ <type 'unicode'> 3 3
>
> Is there any way around this? I'd like to be able to open all my
> database connections at startup time instead of constantly
> opening/closing them.
>
> I'm using cx_Oracle 5.1 with oracle client version 10.2.0.1 on SLES
> 10. Both databases that I'm connecting to are 10.2.0.5.
>
> Thanks,
>
> Dana P
>
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network
> management toolset available today.  Delivers lowest initial
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>

Re: [cx-oracle-users] maintaining multiple connections with different encodings

From: Dana P. <da...@gm...> - 2011-05-09 19:28:47

> Second, why do you want to use different encodings? Oracle will
> automatically translate between whatever encoding is used in the
> database and whatever encoding you are using on the client anyway. So
> you can feel quite free to use utf-8 for both databases and everything
> will work just fine since utf-8 is capable of displaying all of the
> characters in the latin-1 character set.

Ok, this is ideal, so I tried using utf-8 for both connections. This
works for my trivial example. That is, binding a variable and
selecting it back out. But not if I actually insert data into a table
and select it back out.

The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both
databases):

CREATE TABLE test (str VARCHAR2(30 BYTE));


databases = ('utf8db', 'latin1db')
username = 'dana'
password = 'whatever'

os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8'
for tns in databases:
   conn = cx_Oracle.connect(username, password, tns)
   print "connection encoding: %s" % conn.encoding

   conn.cursor().execute('INSERT INTO test (str) VALUES (:s)',
s=u'd\u00e4t\u00e4')
   s = conn.cursor().execute('SELECT str FROM
test').fetchone()[0].decode('utf8')
   print s, type(s), len(s), len(s.encode('utf8'))

   conn.close()

Which outputs:

connection encoding: UTF-8
dåta <type 'unicode'> 4 5
connection encoding: UTF-8
d¿ta <type 'unicode'> 4 5

Perhaps I'm missing something?


On Mon, May 9, 2011 at 12:14 PM, Anthony Tuininga
<ant...@gm...> wrote:
> Hi,
>
> First, Oracle does not allow redefining the default encoding with
> multiple connections. So the behavior you are seeing is expected.
>
> Second, why do you want to use different encodings? Oracle will
> automatically translate between whatever encoding is used in the
> database and whatever encoding you are using on the client anyway. So
> you can feel quite free to use utf-8 for both databases and everything
> will work just fine since utf-8 is capable of displaying all of the
> characters in the latin-1 character set.
>
> Hope that helps.
>
> Anthony
>
> On Mon, May 9, 2011 at 7:00 AM, Dana Pieluszczak <da...@gm...> wrote:
>> I have a python process which needs to connect to multiple oracle
>> databases with different NLS_CHARACTERSET values.
>>
>> I can set the appropriate character set by setting the NLS_LANG
>> environment variable, either in my shell or using os.environ before
>> connecting. For utf-8 databases, I set it to "AMERICAN_AMERICA.UTF8",
>> for Latin-1 (ISO-8859-1) I set it to "AMERICAN_AMERICA.WE8ISO8859P1".
>>
>> Here's some code that illustrates the problem:
>>
>> import os
>> import cx_Oracle
>>
>> os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8'
>> utf8_conn = cx_Oracle.connect('dana', 'pw', 'utf8db')
>> print utf8_conn.encoding
>> s = utf8_conn.cursor().execute('SELECT :s AS str FROM dual',
>> s=u'd\u00e4t\u00e4').fetchone()[0]
>> print s, type(s), len(s), len(s.encode('utf8'))
>>
>> utf8_conn.close()
>>
>> os.environ['NLS_LANG'] ='AMERICAN_AMERICA.WE8ISO8859P1'
>> latin1_conn = cx_Oracle.connect('dana', 'pw', 'latin1db')
>> print latin1_conn.encoding
>> s = latin1_conn.cursor().execute('SELECT :s AS str FROM dual',
>> s='d\xe4t\xe4').fetchone()[0].decode('latin1')
>> print s, type(s), len(s), len(s.encode('latin1'))
>>
>> This outputs:
>> UTF-8
>> dätä <type 'unicode'> 4 6
>> ISO-8859-1
>> dätä <type 'unicode'> 4 4
>>
>> The problem is, if I don't close the utf8 connection before opening
>> the latin1 connection, the latin1 connection's encoding will be utf8.
>> Here's the output with the utf8_conn.close() call removed:
>>
>> UTF-8
>> dätä <type 'unicode'> 4 6
>> UTF-8
>> dÂ¿ <type 'unicode'> 3 3
>>
>> Is there any way around this? I'd like to be able to open all my
>> database connections at startup time instead of constantly
>> opening/closing them.
>>
>> I'm using cx_Oracle 5.1 with oracle client version 10.2.0.1 on SLES
>> 10. Both databases that I'm connecting to are 10.2.0.5.
>>
>> Thanks,
>>
>> Dana P
>>
>> ------------------------------------------------------------------------------
>> WhatsUp Gold - Download Free Network Management Software
>> The most intuitive, comprehensive, and cost-effective network
>> management toolset available today.  Delivers lowest initial
>> acquisition cost and overall TCO of any competing solution.
>> http://p.sf.net/sfu/whatsupgold-sd
>> _______________________________________________
>> cx-oracle-users mailing list
>> cx-...@li...
>> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>>
>
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network
> management toolset available today.  Delivers lowest initial
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>

Re: [cx-oracle-users] maintaining multiple connections with different encodings

From: Doug H. <djh...@te...> - 2011-05-09 22:57:13

On 2011-05-09 13:28, Dana Pieluszczak wrote:
> <snip>
> The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both
> databases):
>
> CREATE TABLE test (str VARCHAR2(30 BYTE));
>
> <snip>
What are the values of NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET from
the sys.nls_database_parameters view? Mine are AL32UTF8 and AL16UTF16
respectively.

It may be illuminating to use this query:

select str, dump(str) from test;

to see what is actually stored in the table. This is what I get:
dätä    Typ=1 Len=6: 100,195,164,116,195,164

-- 
Doug Henderson, Calgary, Alberta, Canada

Re: [cx-oracle-users] maintaining multiple connections with different encodings

From: Dana P. <da...@gm...> - 2011-05-10 13:37:53

Looks like this is an ID10T error. I had previously inserted some data
into my test table, which, on the latin-1 database was neither valid
latin-1 or valid utf-8 (I blame sqlplus). So as Anthony had said
before, setting NLS_LANG to 'AMERICAN_AMERICA.UTF8' gives me utf-8
encoded data from both databases.

Thanks for the tip about DUMP(), I didn't know that function existed
and wouldn't have figured this out without it.

Dana P

On Mon, May 9, 2011 at 6:41 PM, Doug Henderson <djh...@te...> wrote:
> On 2011-05-09 13:28, Dana Pieluszczak wrote:
>> <snip>
>> The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both
>> databases):
>>
>> CREATE TABLE test (str VARCHAR2(30 BYTE));
>>
>> <snip>
> What are the values of NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET from
> the sys.nls_database_parameters view? Mine are AL32UTF8 and AL16UTF16
> respectively.
>
> It may be illuminating to use this query:
>
> select str, dump(str) from test;
>
> to see what is actually stored in the table. This is what I get:
> dätä    Typ=1 Len=6: 100,195,164,116,195,164
>
> --
> Doug Henderson, Calgary, Alberta, Canada
>
>
> ------------------------------------------------------------------------------
> Achieve unprecedented app performance and reliability
> What every C/C++ and Fortran developer should know.
> Learn how Intel has extended the reach of its next-generation tools
> to help boost performance applications - inlcuding clusters.
> http://p.sf.net/sfu/intel-dev2devmay
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>

Re: [cx-oracle-users] maintaining multiple connections with different encodings

From: Anthony T. <ant...@gm...> - 2011-05-10 17:50:41

FYI, I just found out that SQL*Plus on Windows does NOT support
Unicode at all. If you need to insert Unicode data use SQL Developer
or i*SQLPlus -- or cx_Oracle, of course! :-)

So there is a very good possibility that SQL*Plus screwed up your data for you.

Glad to hear you got the problem resolved, though.

Anthony

On Tue, May 10, 2011 at 7:37 AM, Dana Pieluszczak <da...@gm...> wrote:
> Looks like this is an ID10T error. I had previously inserted some data
> into my test table, which, on the latin-1 database was neither valid
> latin-1 or valid utf-8 (I blame sqlplus). So as Anthony had said
> before, setting NLS_LANG to 'AMERICAN_AMERICA.UTF8' gives me utf-8
> encoded data from both databases.
>
> Thanks for the tip about DUMP(), I didn't know that function existed
> and wouldn't have figured this out without it.
>
> Dana P
>
> On Mon, May 9, 2011 at 6:41 PM, Doug Henderson <djh...@te...> wrote:
>> On 2011-05-09 13:28, Dana Pieluszczak wrote:
>>> <snip>
>>> The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both
>>> databases):
>>>
>>> CREATE TABLE test (str VARCHAR2(30 BYTE));
>>>
>>> <snip>
>> What are the values of NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET from
>> the sys.nls_database_parameters view? Mine are AL32UTF8 and AL16UTF16
>> respectively.
>>
>> It may be illuminating to use this query:
>>
>> select str, dump(str) from test;
>>
>> to see what is actually stored in the table. This is what I get:
>> dätä    Typ=1 Len=6: 100,195,164,116,195,164
>>
>> --
>> Doug Henderson, Calgary, Alberta, Canada
>>
>>
>> ------------------------------------------------------------------------------
>> Achieve unprecedented app performance and reliability
>> What every C/C++ and Fortran developer should know.
>> Learn how Intel has extended the reach of its next-generation tools
>> to help boost performance applications - inlcuding clusters.
>> http://p.sf.net/sfu/intel-dev2devmay
>> _______________________________________________
>> cx-oracle-users mailing list
>> cx-...@li...
>> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>>
>
> ------------------------------------------------------------------------------
> Achieve unprecedented app performance and reliability
> What every C/C++ and Fortran developer should know.
> Learn how Intel has extended the reach of its next-generation tools
> to help boost performance applications - inlcuding clusters.
> http://p.sf.net/sfu/intel-dev2devmay
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>

Re: [cx-oracle-users] maintaining multiple connections with different encodings

From: Doug H. <djh...@te...> - 2011-05-12 12:57:24

I use a mixed bag of tools, including SQL*Plus, SQL Developer, and
python scripts using cx_Oracle. I need to be able to correctly store and
retrieve a wide range of international characters with all of these
tools. It was very frustrating to have different results depending on
the tools and the environmental configuration. BTW, cx_Oracle always
seemed to work correctly.

I ran some experiments with SQL*Plus from the 11.1 instant client on a
Win7 box with a LAN connected linux box running a 10g server.

SQL*Plus does not recognize the BOM in an  8-bit file script file. It
gets treated as an invalid character.
As far as I could tell, it does not recognize a 16-bit Unicode file at all.

It does read 8-bit source files, and the characters are interpreted
according the setting of the NLS_LANG environment variable.

I use NLS_LANG=ENGLISH_CANADA.UTF8 in my windows batch files to get
utf-8 encoded characters properly loaded into oracle tables.

The NCHR function takes a 16-bit binary unicode code point and creates
the correct character in the database, similar to CHR which takes a
8-bit code to create an ASCII character. This is helpful for generating
the correct character when you cannot rely on the NLS_LANG setting.

One needs to be careful in interpreting the terminal output from
SQL*Plus, as it is sent to the command window, where it may be
incorrectly displayed. For this reason, I use the sql DUMP function to
display the binary content of the database column, instead of relying on
the displayed characters in the command window.

On 2011-05-10 11:47, Anthony Tuininga wrote:
> FYI, I just found out that SQL*Plus on Windows does NOT support
> Unicode at all. If you need to insert Unicode data use SQL Developer
> or i*SQLPlus -- or cx_Oracle, of course! :-)
>
> So there is a very good possibility that SQL*Plus screwed up your data for you.
>
> Glad to hear you got the problem resolved, though.
>
> Anthony
>
> <snip>

-- 
Doug Henderson, Calgary, Alberta, Canada