Re: [cx-oracle-users] Problem with internal buffer sizes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Anthony,

in my opinion the code of cx_Oracle is not realy correct when
dealing with a client side encoding which can result in more than
one byte per character.=20

I invested some hours to investigat your code and Oracle OCI
documentation to get closer to the problem. Now I have some
hints and I hope it will help you to extend the code.

1) Szenario to reproduce the error:
Database has to be in a character set, which can generate
two byte encodings in UTF-8. In most cases LATIN1 (aka WE8ISO8859P1)
is ok.=20

Check:=20
select value from nls_database_parameters where parameter =3D 'NLS_CHARACTE=
RSET';

2) Create the following table:
create table tst (
       data varchar2(2)
)

insert into tst values ('1=C3=84');

Thsi should fit in the table, because '=C3=84' can be represented
as one byte in Latin1.

3) Take this simple python code:
-------------8<--------------------------
#!/usr/bin/python
# -*- coding: utf8 -*-
import sys
import cx_Oracle as dbi

if __name__ =3D=3D '__main__':
    con =3D dbi.connect('scott/tiger@twws')
    cur =3D con.cursor()
    cur.execute('select data from tst')
    row =3D cur.fetchone()
    print row
    cur.close()
    con.close()
-------------8<--------------------------

4) Now the tests:
export NLS_LANG=3DGERMAN_GERMANY.WE8ISO8859P1
 ./tst.py
Output: ('1\xc4',)
Everything is fine, because on the client side a buffer of length 2 is
allocated and the OCI-define is limited to 2.
---
export NLS_LANG=3DGERMAN_GERMANY.UTF8
 ./tst.py
Output:=20
Traceback (most recent call last):
  File "./tst.py", line 10, in ?
    row =3D cur.fetchone()
cx_Oracle.DatabaseError: column at array pos 0 fetched with error: 1406

Reason (as far as I can see): 2 characters =3D 2 bytes on the server side
are expanded to 2 characters =3D 3 bytes on client side. OCI is telling =
that
the length (db side) of that variable is 2, so 2 bytes are allocated.
Fetch is done in a 2 byte array, that's too small =3D> Error message.

(comment: If I use varchar2(1), it seems to work because you add
some bytes in any case to the buffer length.)

5) I have seen, that you made the assumption, that strings (not lobs) have
a maximum length of 4000.
StringVar.c:#define MAX_STRING_LENGTH           4000
As far as I understood the Oracle documentation, this is true in terms
of byte representation on the db side. This is not true on the client =
side.

6) I changed your code to get a simple trace facility. If you like that =
code
contact me. I found the following paragraphof code in Environment.c:
--------------8<---------------------
#ifdef OCI_NLS_CHARSET_MAXBYTESZ
    status =3D OCINlsNumericInfoGet(environment->handle,
            environment->errorHandle, &environment->maxBytesPerCharacter,
            OCI_NLS_CHARSET_MAXBYTESZ);
    if (Environment_CheckForError(environment, status,
            "Environment_New(): get max bytes per character") < 0) {
        Py_DECREF(environment);
        return NULL;
    }

    // acquire whether character set is fixed width
    status =3D OCINlsNumericInfoGet(environment->handle,
            environment->errorHandle, &environment->fixedWidth,
            OCI_NLS_CHARSET_FIXEDWIDTH);
    if (Environment_CheckForError(environment, status,
            "Environment_New(): determine if charset is fixed width") < 0) =
{
        Py_DECREF(environment);
        return NULL;
    }
#endif
    trace("Environment.c: maxBytesPerCharacter: %d\n", environment->maxByte=
sPerC
haracter);  // thats from me

--------------8<---------------------
This value was 3 when I worked with UTF8
and only 1 when I chose LATIN1. As far as I can see, you
use this value only with lobs. Is that true?
Now my hint: Could you adjust the allocated client side buffer length
with maxBytesPerCharacter. As I'm totally new to your code,
I'm doing really hard finding all places and not forgetting some
side effects. So, I can't give you a patch.
Until now I only checked the code for "defined" variables. The
same must be true for "bind" variables.=20
So, I'm pretty sure, that the following code snippet is also not really
valid:
---------------
    else if (PyString_Check(value)) {
        size =3D PyString_GET_SIZE(value);
        trace("Variable.c: size: %d\n", size);
        if (size > MAX_STRING_LENGTH)
            varType =3D &vt_LongString;
------------------
Szenario. I have a litte table like:
create table tst (
data varchar2(4000)
)
and a program like:
------------------
#!/usr/bin/python
# -*- coding: utf8 -*-
import sys
import cx_Oracle as dbi

if __name__ =3D=3D '__main__':
    con =3D dbi.connect('scott/tiger@twws')
    cur =3D con.cursor()
    val =3D '=C3=84' * 4000
    print 'Length: ', len(val)
    con.begin()
    cur.execute('insert into tst values (:val)', (val, ))
    con.commit()
    cur.close()
    con.close()
------------------------------
The mentione c code above assumes in any case that
I whant to bind a long variable. Interestingly, as far as
this long variable is smalller or equal to the size of the
destination varchar2-column (db-side, when used UTF-8
8000byte=3D>4000byte), the insert is done even if it's bound
 as long type. Orcale seems  to perform a transparant conversion.
 As soon as the destination is too small (LATINI1, 8000byte =3D> 8000byte)
I get an error about long variable binding what will confuse someone
not knowing that you internally switched to long-binding.
(Traceback (most recent call last):
  File "./i.py", line 12, in ?
    cur.execute('insert into tst values (:val)', (val, ))
cx_Oracle.DatabaseError: ORA-01461: Ein LONG-Wert kann nur zur Einf=E2=96=
=92gung in eine LONG-Spalte gebunden werden
)

7) In any case this is no criticism about your work. But I would
like to work with UTF-8 encoding on client side as transparently
as I can with Latin1. I like cx_Oracle with python much more
than the Perl DBI stuff.
As long as the different semantics of characters and bytes is
the way it is in python 2.x < python 3000, the byte arrays
delivered by cx_Oracle to python should reflect the correct
encoding of the db-client-side.=20

I hope you can correct this problem. As you can see, I like to invest
a couple of time to assist or help you.

Best regards
Andreas Mock

P.S.: Ignore all mistakes in this English text.

7) Found the following documentation of byte expansion while using
unicode. As far as I can see, the maximum expanded buffer size can be
4000 * 4 in worst case.
http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch7prog=
runicode.htm#i1006452=20

>>> "Anthony Tuininga" <ant...@gm...> 23.01.2007 18:59 >>>
On 1/23/07, matilda matilda <ma...@gr...> wrote:

> As far as I know, sqlplus does really double the internal buffers
> to do the communication for the following szenario:
> DB character set is Latin1 =3D> I can store a maximum of 4000
> German umlauts to the a string column. That results in 8000
> bytes on the client side if there is a multibyte encoding enabled.

Hmm, that's worth trying, I suppose. Would you be able to provide a
small test case that demonstrates the problem? Something that has 4000
characters in it which will actually result in considerably larger
number of bytes. I can then try that and see if your assumption is
correct. If it is, then the matter is fairly simple to address.