Thread: [cx-oracle-users] maintaining multiple connections with different encodings
Brought to you by:
atuining
From: Dana P. <da...@gm...> - 2011-05-09 13:00:40
|
I have a python process which needs to connect to multiple oracle databases with different NLS_CHARACTERSET values. I can set the appropriate character set by setting the NLS_LANG environment variable, either in my shell or using os.environ before connecting. For utf-8 databases, I set it to "AMERICAN_AMERICA.UTF8", for Latin-1 (ISO-8859-1) I set it to "AMERICAN_AMERICA.WE8ISO8859P1". Here's some code that illustrates the problem: import os import cx_Oracle os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8' utf8_conn = cx_Oracle.connect('dana', 'pw', 'utf8db') print utf8_conn.encoding s = utf8_conn.cursor().execute('SELECT :s AS str FROM dual', s=u'd\u00e4t\u00e4').fetchone()[0] print s, type(s), len(s), len(s.encode('utf8')) utf8_conn.close() os.environ['NLS_LANG'] ='AMERICAN_AMERICA.WE8ISO8859P1' latin1_conn = cx_Oracle.connect('dana', 'pw', 'latin1db') print latin1_conn.encoding s = latin1_conn.cursor().execute('SELECT :s AS str FROM dual', s='d\xe4t\xe4').fetchone()[0].decode('latin1') print s, type(s), len(s), len(s.encode('latin1')) This outputs: UTF-8 dätä <type 'unicode'> 4 6 ISO-8859-1 dätä <type 'unicode'> 4 4 The problem is, if I don't close the utf8 connection before opening the latin1 connection, the latin1 connection's encoding will be utf8. Here's the output with the utf8_conn.close() call removed: UTF-8 dätä <type 'unicode'> 4 6 UTF-8 d¿ <type 'unicode'> 3 3 Is there any way around this? I'd like to be able to open all my database connections at startup time instead of constantly opening/closing them. I'm using cx_Oracle 5.1 with oracle client version 10.2.0.1 on SLES 10. Both databases that I'm connecting to are 10.2.0.5. Thanks, Dana P |
From: Anthony T. <ant...@gm...> - 2011-05-09 16:14:44
|
Hi, First, Oracle does not allow redefining the default encoding with multiple connections. So the behavior you are seeing is expected. Second, why do you want to use different encodings? Oracle will automatically translate between whatever encoding is used in the database and whatever encoding you are using on the client anyway. So you can feel quite free to use utf-8 for both databases and everything will work just fine since utf-8 is capable of displaying all of the characters in the latin-1 character set. Hope that helps. Anthony On Mon, May 9, 2011 at 7:00 AM, Dana Pieluszczak <da...@gm...> wrote: > I have a python process which needs to connect to multiple oracle > databases with different NLS_CHARACTERSET values. > > I can set the appropriate character set by setting the NLS_LANG > environment variable, either in my shell or using os.environ before > connecting. For utf-8 databases, I set it to "AMERICAN_AMERICA.UTF8", > for Latin-1 (ISO-8859-1) I set it to "AMERICAN_AMERICA.WE8ISO8859P1". > > Here's some code that illustrates the problem: > > import os > import cx_Oracle > > os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8' > utf8_conn = cx_Oracle.connect('dana', 'pw', 'utf8db') > print utf8_conn.encoding > s = utf8_conn.cursor().execute('SELECT :s AS str FROM dual', > s=u'd\u00e4t\u00e4').fetchone()[0] > print s, type(s), len(s), len(s.encode('utf8')) > > utf8_conn.close() > > os.environ['NLS_LANG'] ='AMERICAN_AMERICA.WE8ISO8859P1' > latin1_conn = cx_Oracle.connect('dana', 'pw', 'latin1db') > print latin1_conn.encoding > s = latin1_conn.cursor().execute('SELECT :s AS str FROM dual', > s='d\xe4t\xe4').fetchone()[0].decode('latin1') > print s, type(s), len(s), len(s.encode('latin1')) > > This outputs: > UTF-8 > dätä <type 'unicode'> 4 6 > ISO-8859-1 > dätä <type 'unicode'> 4 4 > > The problem is, if I don't close the utf8 connection before opening > the latin1 connection, the latin1 connection's encoding will be utf8. > Here's the output with the utf8_conn.close() call removed: > > UTF-8 > dätä <type 'unicode'> 4 6 > UTF-8 > d¿ <type 'unicode'> 3 3 > > Is there any way around this? I'd like to be able to open all my > database connections at startup time instead of constantly > opening/closing them. > > I'm using cx_Oracle 5.1 with oracle client version 10.2.0.1 on SLES > 10. Both databases that I'm connecting to are 10.2.0.5. > > Thanks, > > Dana P > > ------------------------------------------------------------------------------ > WhatsUp Gold - Download Free Network Management Software > The most intuitive, comprehensive, and cost-effective network > management toolset available today. Delivers lowest initial > acquisition cost and overall TCO of any competing solution. > http://p.sf.net/sfu/whatsupgold-sd > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |
From: Dana P. <da...@gm...> - 2011-05-09 19:28:47
|
> Second, why do you want to use different encodings? Oracle will > automatically translate between whatever encoding is used in the > database and whatever encoding you are using on the client anyway. So > you can feel quite free to use utf-8 for both databases and everything > will work just fine since utf-8 is capable of displaying all of the > characters in the latin-1 character set. Ok, this is ideal, so I tried using utf-8 for both connections. This works for my trivial example. That is, binding a variable and selecting it back out. But not if I actually insert data into a table and select it back out. The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both databases): CREATE TABLE test (str VARCHAR2(30 BYTE)); databases = ('utf8db', 'latin1db') username = 'dana' password = 'whatever' os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8' for tns in databases: conn = cx_Oracle.connect(username, password, tns) print "connection encoding: %s" % conn.encoding conn.cursor().execute('INSERT INTO test (str) VALUES (:s)', s=u'd\u00e4t\u00e4') s = conn.cursor().execute('SELECT str FROM test').fetchone()[0].decode('utf8') print s, type(s), len(s), len(s.encode('utf8')) conn.close() Which outputs: connection encoding: UTF-8 dåta <type 'unicode'> 4 5 connection encoding: UTF-8 d¿ta <type 'unicode'> 4 5 Perhaps I'm missing something? On Mon, May 9, 2011 at 12:14 PM, Anthony Tuininga <ant...@gm...> wrote: > Hi, > > First, Oracle does not allow redefining the default encoding with > multiple connections. So the behavior you are seeing is expected. > > Second, why do you want to use different encodings? Oracle will > automatically translate between whatever encoding is used in the > database and whatever encoding you are using on the client anyway. So > you can feel quite free to use utf-8 for both databases and everything > will work just fine since utf-8 is capable of displaying all of the > characters in the latin-1 character set. > > Hope that helps. > > Anthony > > On Mon, May 9, 2011 at 7:00 AM, Dana Pieluszczak <da...@gm...> wrote: >> I have a python process which needs to connect to multiple oracle >> databases with different NLS_CHARACTERSET values. >> >> I can set the appropriate character set by setting the NLS_LANG >> environment variable, either in my shell or using os.environ before >> connecting. For utf-8 databases, I set it to "AMERICAN_AMERICA.UTF8", >> for Latin-1 (ISO-8859-1) I set it to "AMERICAN_AMERICA.WE8ISO8859P1". >> >> Here's some code that illustrates the problem: >> >> import os >> import cx_Oracle >> >> os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8' >> utf8_conn = cx_Oracle.connect('dana', 'pw', 'utf8db') >> print utf8_conn.encoding >> s = utf8_conn.cursor().execute('SELECT :s AS str FROM dual', >> s=u'd\u00e4t\u00e4').fetchone()[0] >> print s, type(s), len(s), len(s.encode('utf8')) >> >> utf8_conn.close() >> >> os.environ['NLS_LANG'] ='AMERICAN_AMERICA.WE8ISO8859P1' >> latin1_conn = cx_Oracle.connect('dana', 'pw', 'latin1db') >> print latin1_conn.encoding >> s = latin1_conn.cursor().execute('SELECT :s AS str FROM dual', >> s='d\xe4t\xe4').fetchone()[0].decode('latin1') >> print s, type(s), len(s), len(s.encode('latin1')) >> >> This outputs: >> UTF-8 >> dätä <type 'unicode'> 4 6 >> ISO-8859-1 >> dätä <type 'unicode'> 4 4 >> >> The problem is, if I don't close the utf8 connection before opening >> the latin1 connection, the latin1 connection's encoding will be utf8. >> Here's the output with the utf8_conn.close() call removed: >> >> UTF-8 >> dätä <type 'unicode'> 4 6 >> UTF-8 >> d¿ <type 'unicode'> 3 3 >> >> Is there any way around this? I'd like to be able to open all my >> database connections at startup time instead of constantly >> opening/closing them. >> >> I'm using cx_Oracle 5.1 with oracle client version 10.2.0.1 on SLES >> 10. Both databases that I'm connecting to are 10.2.0.5. >> >> Thanks, >> >> Dana P >> >> ------------------------------------------------------------------------------ >> WhatsUp Gold - Download Free Network Management Software >> The most intuitive, comprehensive, and cost-effective network >> management toolset available today. Delivers lowest initial >> acquisition cost and overall TCO of any competing solution. >> http://p.sf.net/sfu/whatsupgold-sd >> _______________________________________________ >> cx-oracle-users mailing list >> cx-...@li... >> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users >> > > ------------------------------------------------------------------------------ > WhatsUp Gold - Download Free Network Management Software > The most intuitive, comprehensive, and cost-effective network > management toolset available today. Delivers lowest initial > acquisition cost and overall TCO of any competing solution. > http://p.sf.net/sfu/whatsupgold-sd > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |
From: Doug H. <djh...@te...> - 2011-05-09 22:57:13
|
On 2011-05-09 13:28, Dana Pieluszczak wrote: > <snip> > The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both > databases): > > CREATE TABLE test (str VARCHAR2(30 BYTE)); > > <snip> What are the values of NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET from the sys.nls_database_parameters view? Mine are AL32UTF8 and AL16UTF16 respectively. It may be illuminating to use this query: select str, dump(str) from test; to see what is actually stored in the table. This is what I get: dätä Typ=1 Len=6: 100,195,164,116,195,164 -- Doug Henderson, Calgary, Alberta, Canada |
From: Dana P. <da...@gm...> - 2011-05-10 13:37:53
|
Looks like this is an ID10T error. I had previously inserted some data into my test table, which, on the latin-1 database was neither valid latin-1 or valid utf-8 (I blame sqlplus). So as Anthony had said before, setting NLS_LANG to 'AMERICAN_AMERICA.UTF8' gives me utf-8 encoded data from both databases. Thanks for the tip about DUMP(), I didn't know that function existed and wouldn't have figured this out without it. Dana P On Mon, May 9, 2011 at 6:41 PM, Doug Henderson <djh...@te...> wrote: > On 2011-05-09 13:28, Dana Pieluszczak wrote: >> <snip> >> The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both >> databases): >> >> CREATE TABLE test (str VARCHAR2(30 BYTE)); >> >> <snip> > What are the values of NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET from > the sys.nls_database_parameters view? Mine are AL32UTF8 and AL16UTF16 > respectively. > > It may be illuminating to use this query: > > select str, dump(str) from test; > > to see what is actually stored in the table. This is what I get: > dätä Typ=1 Len=6: 100,195,164,116,195,164 > > -- > Doug Henderson, Calgary, Alberta, Canada > > > ------------------------------------------------------------------------------ > Achieve unprecedented app performance and reliability > What every C/C++ and Fortran developer should know. > Learn how Intel has extended the reach of its next-generation tools > to help boost performance applications - inlcuding clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |
From: Anthony T. <ant...@gm...> - 2011-05-10 17:50:41
|
FYI, I just found out that SQL*Plus on Windows does NOT support Unicode at all. If you need to insert Unicode data use SQL Developer or i*SQLPlus -- or cx_Oracle, of course! :-) So there is a very good possibility that SQL*Plus screwed up your data for you. Glad to hear you got the problem resolved, though. Anthony On Tue, May 10, 2011 at 7:37 AM, Dana Pieluszczak <da...@gm...> wrote: > Looks like this is an ID10T error. I had previously inserted some data > into my test table, which, on the latin-1 database was neither valid > latin-1 or valid utf-8 (I blame sqlplus). So as Anthony had said > before, setting NLS_LANG to 'AMERICAN_AMERICA.UTF8' gives me utf-8 > encoded data from both databases. > > Thanks for the tip about DUMP(), I didn't know that function existed > and wouldn't have figured this out without it. > > Dana P > > On Mon, May 9, 2011 at 6:41 PM, Doug Henderson <djh...@te...> wrote: >> On 2011-05-09 13:28, Dana Pieluszczak wrote: >>> <snip> >>> The table below is defined as (NLS_LENGTH_SEMANTICS IS 'BYTE' for both >>> databases): >>> >>> CREATE TABLE test (str VARCHAR2(30 BYTE)); >>> >>> <snip> >> What are the values of NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET from >> the sys.nls_database_parameters view? Mine are AL32UTF8 and AL16UTF16 >> respectively. >> >> It may be illuminating to use this query: >> >> select str, dump(str) from test; >> >> to see what is actually stored in the table. This is what I get: >> dätä Typ=1 Len=6: 100,195,164,116,195,164 >> >> -- >> Doug Henderson, Calgary, Alberta, Canada >> >> >> ------------------------------------------------------------------------------ >> Achieve unprecedented app performance and reliability >> What every C/C++ and Fortran developer should know. >> Learn how Intel has extended the reach of its next-generation tools >> to help boost performance applications - inlcuding clusters. >> http://p.sf.net/sfu/intel-dev2devmay >> _______________________________________________ >> cx-oracle-users mailing list >> cx-...@li... >> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users >> > > ------------------------------------------------------------------------------ > Achieve unprecedented app performance and reliability > What every C/C++ and Fortran developer should know. > Learn how Intel has extended the reach of its next-generation tools > to help boost performance applications - inlcuding clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |
From: Doug H. <djh...@te...> - 2011-05-12 12:57:24
|
I use a mixed bag of tools, including SQL*Plus, SQL Developer, and python scripts using cx_Oracle. I need to be able to correctly store and retrieve a wide range of international characters with all of these tools. It was very frustrating to have different results depending on the tools and the environmental configuration. BTW, cx_Oracle always seemed to work correctly. I ran some experiments with SQL*Plus from the 11.1 instant client on a Win7 box with a LAN connected linux box running a 10g server. SQL*Plus does not recognize the BOM in an 8-bit file script file. It gets treated as an invalid character. As far as I could tell, it does not recognize a 16-bit Unicode file at all. It does read 8-bit source files, and the characters are interpreted according the setting of the NLS_LANG environment variable. I use NLS_LANG=ENGLISH_CANADA.UTF8 in my windows batch files to get utf-8 encoded characters properly loaded into oracle tables. The NCHR function takes a 16-bit binary unicode code point and creates the correct character in the database, similar to CHR which takes a 8-bit code to create an ASCII character. This is helpful for generating the correct character when you cannot rely on the NLS_LANG setting. One needs to be careful in interpreting the terminal output from SQL*Plus, as it is sent to the command window, where it may be incorrectly displayed. For this reason, I use the sql DUMP function to display the binary content of the database column, instead of relying on the displayed characters in the command window. On 2011-05-10 11:47, Anthony Tuininga wrote: > FYI, I just found out that SQL*Plus on Windows does NOT support > Unicode at all. If you need to insert Unicode data use SQL Developer > or i*SQLPlus -- or cx_Oracle, of course! :-) > > So there is a very good possibility that SQL*Plus screwed up your data for you. > > Glad to hear you got the problem resolved, though. > > Anthony > > <snip> -- Doug Henderson, Calgary, Alberta, Canada |