Re: [cx-oracle-users] Extracting data in Native Language: Error UnicodeEncodeError

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Just for reference, if you've using Oracle 10 or later you don't need to use
NVARCHAR2 to store multi-byte characters.  You can use VARCHAR2 (x CHAR)
instead. The "x" here limits the field to a certain number of characters,
not bytes.
The default for VARCHAR2 is that the length specified is bytes unless you
specify the CHAR qualifier.  However, you can change this at system-level by
setting NLS_LENGTH_SEMANTICS to CHAR.  That way, if you declare a field as
VARCHAR2(10) say, it will default to being 10 CHAR, not 10 BYTE.
Even if you've already set up some VARCHAR2 fields, you can easily change
the definitions to switch them all to CHAR.
Below is a script to do this which I've used myself on databases.  The nice
thing about doing this is you can then carry on defining fields in future as
VARCHAR2(10) say, and know that it'll hold 10 characters, regardless of how
many bytes each character is, and you don't need to bother with NVARCHAR at
all.  I've used this approach on several systems and I never give any
thought these days to what characters are being stored.

Note - if you're using partitioning and have already got partition key
columns which are VARCHAR2, trying to change from BYTE to CHAR will give an
exception.  The script picks this up and reports it.

--
-- this is to change dictionary definitions for existing cols from BYTE to
CHAR
-- The database should have already had NLS_LENGTH_SEMANTICS = CHAR set

-- This has to be run in each application schema that has tables. (not in
any Oracle schemas like SYS or SYSTEM)
-- ORA-14060 - cannot change length of partitioning column.
-- So this needs to be done before any tables created where partitioning is
on an affected (varchar2 or char) col

alter session set NLS_LENGTH_SEMANTICS = CHAR;

declare
  v_stmt varchar2(1000);
  PARTITIONING_COLUMN EXCEPTION;
  pragma exception_init(PARTITIONING_COLUMN,-14060);

begin
   dbms_output.enable(NULL);

   for r1 in (select table_name
                    ,column_name
                    ,data_type
                    ,data_length
                    ,char_length
                from USER_TAB_COLUMNS
                where column_id is not null /* excludes oracle-generated
cols for func-based indexes */
                  and data_type in ('VARCHAR2','CHAR')
                  and char_used = 'B' /* defined as byte length */
                order by table_name asc,column_id asc)
    loop
       begin
          v_stmt := 'alter table '||r1.table_name||' modify
'||r1.column_name||' '||r1.data_type||'('||r1.char_length||')';
          execute immediate v_stmt;
          dbms_output.put_line(substr(v_stmt,1,250));
       exception when PARTITIONING_COLUMN then
          dbms_output.put_line('+++ '||r1.table_name||'.'||r1.column_name||'
is partitioning col and cannot be changed');
                 when others then NULL;
       end;
    end loop;
end;
/

On 6 October 2011 08:50, Wong Wah Meng-R32813 <r3...@fr...> wrote:

>  Hello there,****
>
> ** **
>
> I am migrating my application from python 1.5.2 to 2.7.1, from using
> oracledb to cx_oracle. ****
>
> ** **
>
> My database is setup with AL32UTF8 characterset, in order to support
> Chinese character entry. ****
>
> ** **
>
> I am in testing stage now. I have never used nVarchar type before as my
> column type but now I am using it so that I can be sure exactly the number
> of bytes the field can store is predictable regardless it is plain English
> or Unicode characters.****
>
> ** **
>
> When I do insert and update, I do not explicitly pass in some prefix (u for
> unicode) onto the data in my SQL statement. However, when I do a SQL query,
> the return list tuple from python cx_Oracle seems to attach a prefix u in
> front of the data with column type nVarchar. When I want to extract or parse
> the data, I encountered an error (expectedly). I am new to this can anyone
> tell what I should look at in order to parse native language characters
> successfully after getting the raw reply from a cx_Oracle’s query?****
>
> ** **
>
> E.g. ****
>
> ** **
>
> >>> reply=a.execute("select * from TEST271")****
>
> >>> reply****
>
> [('\xb2\xe2\xca\xd4', u'\u6d4b\u8bd5')]****
>
> >>> for (text1, text2) in reply:****
>
> ...     print text1, text2****
>
> ... ****
>
> ýãòå Traceback (most recent call last):****
>
>   File "<stdin>", line 2, in <module>****
>
>   File
> "/home/r32813/genesis/GEN_DEV_271/Enablers/Python/lib/python2.7/encodings/hp_roman8.py",
> line 17, in encode****
>
>     return codecs.charmap_encode(input,errors,encoding_map)****
>
> UnicodeEncodeError: 'charmap' codec can't encode characters in position
> 0-1: character maps to <undefined>****
>
> >>>   ****
>
> ** **
>
> SQL> desc test271****
>
>  Name                                      Null?    Type****
>
>  ----------------------------------------- --------
> ----------------------------****
>
>  TEXT1                                              VARCHAR2(16)****
>
>  TEXT2                                              NVARCHAR2(16)****
>
>                         ****
>
> ** **
>
> Regards,****
>
> Wah Meng****
>
> ** **
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>
>