Re: [cx-oracle-users] Extracting data in Native Language: Error UnicodeEncodeError
Brought to you by:
atuining
From: Chris G. <chr...@to...> - 2011-10-06 11:44:19
|
Just for reference, if you've using Oracle 10 or later you don't need to use NVARCHAR2 to store multi-byte characters. You can use VARCHAR2 (x CHAR) instead. The "x" here limits the field to a certain number of characters, not bytes. The default for VARCHAR2 is that the length specified is bytes unless you specify the CHAR qualifier. However, you can change this at system-level by setting NLS_LENGTH_SEMANTICS to CHAR. That way, if you declare a field as VARCHAR2(10) say, it will default to being 10 CHAR, not 10 BYTE. Even if you've already set up some VARCHAR2 fields, you can easily change the definitions to switch them all to CHAR. Below is a script to do this which I've used myself on databases. The nice thing about doing this is you can then carry on defining fields in future as VARCHAR2(10) say, and know that it'll hold 10 characters, regardless of how many bytes each character is, and you don't need to bother with NVARCHAR at all. I've used this approach on several systems and I never give any thought these days to what characters are being stored. Note - if you're using partitioning and have already got partition key columns which are VARCHAR2, trying to change from BYTE to CHAR will give an exception. The script picks this up and reports it. -- -- this is to change dictionary definitions for existing cols from BYTE to CHAR -- The database should have already had NLS_LENGTH_SEMANTICS = CHAR set -- This has to be run in each application schema that has tables. (not in any Oracle schemas like SYS or SYSTEM) -- ORA-14060 - cannot change length of partitioning column. -- So this needs to be done before any tables created where partitioning is on an affected (varchar2 or char) col alter session set NLS_LENGTH_SEMANTICS = CHAR; declare v_stmt varchar2(1000); PARTITIONING_COLUMN EXCEPTION; pragma exception_init(PARTITIONING_COLUMN,-14060); begin dbms_output.enable(NULL); for r1 in (select table_name ,column_name ,data_type ,data_length ,char_length from USER_TAB_COLUMNS where column_id is not null /* excludes oracle-generated cols for func-based indexes */ and data_type in ('VARCHAR2','CHAR') and char_used = 'B' /* defined as byte length */ order by table_name asc,column_id asc) loop begin v_stmt := 'alter table '||r1.table_name||' modify '||r1.column_name||' '||r1.data_type||'('||r1.char_length||')'; execute immediate v_stmt; dbms_output.put_line(substr(v_stmt,1,250)); exception when PARTITIONING_COLUMN then dbms_output.put_line('+++ '||r1.table_name||'.'||r1.column_name||' is partitioning col and cannot be changed'); when others then NULL; end; end loop; end; / On 6 October 2011 08:50, Wong Wah Meng-R32813 <r3...@fr...> wrote: > Hello there,**** > > ** ** > > I am migrating my application from python 1.5.2 to 2.7.1, from using > oracledb to cx_oracle. **** > > ** ** > > My database is setup with AL32UTF8 characterset, in order to support > Chinese character entry. **** > > ** ** > > I am in testing stage now. I have never used nVarchar type before as my > column type but now I am using it so that I can be sure exactly the number > of bytes the field can store is predictable regardless it is plain English > or Unicode characters.**** > > ** ** > > When I do insert and update, I do not explicitly pass in some prefix (u for > unicode) onto the data in my SQL statement. However, when I do a SQL query, > the return list tuple from python cx_Oracle seems to attach a prefix u in > front of the data with column type nVarchar. When I want to extract or parse > the data, I encountered an error (expectedly). I am new to this can anyone > tell what I should look at in order to parse native language characters > successfully after getting the raw reply from a cx_Oracle’s query?**** > > ** ** > > E.g. **** > > ** ** > > >>> reply=a.execute("select * from TEST271")**** > > >>> reply**** > > [('\xb2\xe2\xca\xd4', u'\u6d4b\u8bd5')]**** > > >>> for (text1, text2) in reply:**** > > ... print text1, text2**** > > ... **** > > ýãòå Traceback (most recent call last):**** > > File "<stdin>", line 2, in <module>**** > > File > "/home/r32813/genesis/GEN_DEV_271/Enablers/Python/lib/python2.7/encodings/hp_roman8.py", > line 17, in encode**** > > return codecs.charmap_encode(input,errors,encoding_map)**** > > UnicodeEncodeError: 'charmap' codec can't encode characters in position > 0-1: character maps to <undefined>**** > > >>> **** > > ** ** > > SQL> desc test271**** > > Name Null? Type**** > > ----------------------------------------- -------- > ----------------------------**** > > TEXT1 VARCHAR2(16)**** > > TEXT2 NVARCHAR2(16)**** > > **** > > ** ** > > Regards,**** > > Wah Meng**** > > ** ** > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > > |