[cx-oracle-users] Antw: Re: Problem with internal buffer sizes
Brought to you by:
atuining
From: matilda m. <ma...@gr...> - 2007-01-25 12:32:26
|
Hi Anthony, thank you for reading my long mail and taking the time to look at this problem. At the moment I investigate a similar problem. I created a database with a character set of UTF8. Now it gets really funny. :-)) For example: A table create table tst ( data varchar2(4000) ) means now, that the byte representation of the character stream given by the client has to fit=20 in 4000 Bytes. So I can store only a maximum of 2000 '=C3=84', because the byte representation of '=C3=84' needs 2 bytes. Now something strange: As soon as I try to insert 2001 '=C3=84' with cx_Oracle, I get the strange error message: cx_Oracle.DatabaseError: ORA-01461: Ein LONG-Wert kann nur zur Einf=E2=96= =92gung in eine LONG-Spalte gebunden werden When I try that with TOAD, I get the (more correct) error message: ORA-01704: Zeichenfolge zu lang I found the following hint in the OCI documentation: http://kiti/oracle/ora10gr2/appdev.102/b14250/oci05bnd.htm#i422770=20 Chapter 'Using OCI_ATTR_MAXDATA_SIZE Attribute' and 'Using OCI_ATTR_MAXCHAR_SIZE Attribute'. As far as I understood, with this attributes to declare the buffer length (in bytes, in char) which should be reserved by the server on server side to store the byte representation (database codeset) of the given string. (example: client Latin1: 2000 '=C3=84' =3D client buffer sitze 2000 = bytes,=20 server side: UTF8: 2000 '=C3=84' =3D server buffer size 4000 bytes) I couldn't find any use of setting these two attributes on the bind handle for bound variables in cx_Oracle. Probably this is the reason why I get the wrong error message. I hope you don't start to regret having talked to me. :-))) Best regards and thank you in advance. Andreas Mock >>> "Anthony Tuininga" <ant...@gm...> 25.01.2007 05:06 >>> Looks like you might be right on this. The Oracle folks do give a warning that simply multiplying the number of characters expected by the maximum number of bytes per character can result in significant memory usage -- but its guaranteed to be correct so I think I'll do it. :-) Unless someone else has another brilliant idea?? On 1/24/07, matilda matilda <ma...@gr...> wrote: > Hi Anthony, > > in my opinion the code of cx_Oracle is not realy correct when > dealing with a client side encoding which can result in more than > one byte per character. > > I invested some hours to investigat your code and Oracle OCI > documentation to get closer to the problem. Now I have some > hints and I hope it will help you to extend the code. > > 1) Szenario to reproduce the error: > Database has to be in a character set, which can generate > two byte encodings in UTF-8. In most cases LATIN1 (aka WE8ISO8859P1) > is ok. > > Check: > select value from nls_database_parameters where parameter =3D 'NLS_CHARAC= TERSET'; > > 2) Create the following table: > create table tst ( > data varchar2(2) > ) > > insert into tst values ('1=C3=84'); > > Thsi should fit in the table, because '=C3=84' can be represented > as one byte in Latin1. > > 3) Take this simple python code: > -------------8<-------------------------- > #!/usr/bin/python > # -*- coding: utf8 -*- > import sys > import cx_Oracle as dbi > > if __name__ =3D=3D '__main__': > con =3D dbi.connect('scott/tiger@twws') > cur =3D con.cursor() > cur.execute('select data from tst') > row =3D cur.fetchone() > print row > cur.close() > con.close() > -------------8<-------------------------- > > 4) Now the tests: > export NLS_LANG=3DGERMAN_GERMANY.WE8ISO8859P1 > ./tst.py > Output: ('1\xc4',) > Everything is fine, because on the client side a buffer of length 2 is > allocated and the OCI-define is limited to 2. > --- > export NLS_LANG=3DGERMAN_GERMANY.UTF8 > ./tst.py > Output: > Traceback (most recent call last): > File "./tst.py", line 10, in ? > row =3D cur.fetchone() > cx_Oracle.DatabaseError: column at array pos 0 fetched with error: 1406 > > Reason (as far as I can see): 2 characters =3D 2 bytes on the server = side > are expanded to 2 characters =3D 3 bytes on client side. OCI is telling = that > the length (db side) of that variable is 2, so 2 bytes are allocated. > Fetch is done in a 2 byte array, that's too small =3D> Error message. > > (comment: If I use varchar2(1), it seems to work because you add > some bytes in any case to the buffer length.) > > 5) I have seen, that you made the assumption, that strings (not lobs) = have > a maximum length of 4000. > StringVar.c:#define MAX_STRING_LENGTH 4000 > As far as I understood the Oracle documentation, this is true in terms > of byte representation on the db side. This is not true on the client = side. > > 6) I changed your code to get a simple trace facility. If you like that = code > contact me. I found the following paragraphof code in Environment.c: > --------------8<--------------------- > #ifdef OCI_NLS_CHARSET_MAXBYTESZ > status =3D OCINlsNumericInfoGet(environment->handle, > environment->errorHandle, &environment->maxBytesPerCharacter,= > OCI_NLS_CHARSET_MAXBYTESZ); > if (Environment_CheckForError(environment, status, > "Environment_New(): get max bytes per character") < 0) { > Py_DECREF(environment); > return NULL; > } > > // acquire whether character set is fixed width > status =3D OCINlsNumericInfoGet(environment->handle, > environment->errorHandle, &environment->fixedWidth, > OCI_NLS_CHARSET_FIXEDWIDTH); > if (Environment_CheckForError(environment, status, > "Environment_New(): determine if charset is fixed width") < = 0) { > Py_DECREF(environment); > return NULL; > } > #endif > trace("Environment.c: maxBytesPerCharacter: %d\n", environment->maxBy= tesPerC > haracter); // thats from me > > --------------8<--------------------- > This value was 3 when I worked with UTF8 > and only 1 when I chose LATIN1. As far as I can see, you > use this value only with lobs. Is that true? > Now my hint: Could you adjust the allocated client side buffer length > with maxBytesPerCharacter. As I'm totally new to your code, > I'm doing really hard finding all places and not forgetting some > side effects. So, I can't give you a patch. > Until now I only checked the code for "defined" variables. The > same must be true for "bind" variables. > So, I'm pretty sure, that the following code snippet is also not really > valid: > --------------- > else if (PyString_Check(value)) { > size =3D PyString_GET_SIZE(value); > trace("Variable.c: size: %d\n", size); > if (size > MAX_STRING_LENGTH) > varType =3D &vt_LongString; > ------------------ > Szenario. I have a litte table like: > create table tst ( > data varchar2(4000) > ) > and a program like: > ------------------ > #!/usr/bin/python > # -*- coding: utf8 -*- > import sys > import cx_Oracle as dbi > > if __name__ =3D=3D '__main__': > con =3D dbi.connect('scott/tiger@twws') > cur =3D con.cursor() > val =3D '=C3=84' * 4000 > print 'Length: ', len(val) > con.begin() > cur.execute('insert into tst values (:val)', (val, )) > con.commit() > cur.close() > con.close() > ------------------------------ > The mentione c code above assumes in any case that > I whant to bind a long variable. Interestingly, as far as > this long variable is smalller or equal to the size of the > destination varchar2-column (db-side, when used UTF-8 > 8000byte=3D>4000byte), the insert is done even if it's bound > as long type. Orcale seems to perform a transparant conversion. > As soon as the destination is too small (LATINI1, 8000byte =3D> = 8000byte) > I get an error about long variable binding what will confuse someone > not knowing that you internally switched to long-binding. > (Traceback (most recent call last): > File "./i.py", line 12, in ? > cur.execute('insert into tst values (:val)', (val, )) > cx_Oracle.DatabaseError: ORA-01461: Ein LONG-Wert kann nur zur Einf=E2=96= =92gung in eine LONG-Spalte gebunden werden > ) > > > 7) In any case this is no criticism about your work. But I would > like to work with UTF-8 encoding on client side as transparently > as I can with Latin1. I like cx_Oracle with python much more > than the Perl DBI stuff. > As long as the different semantics of characters and bytes is > the way it is in python 2.x < python 3000, the byte arrays > delivered by cx_Oracle to python should reflect the correct > encoding of the db-client-side. > > I hope you can correct this problem. As you can see, I like to invest > a couple of time to assist or help you. > > Best regards > Andreas Mock > > P.S.: Ignore all mistakes in this English text. > > > > 7) Found the following documentation of byte expansion while using > unicode. As far as I can see, the maximum expanded buffer size can be > 4000 * 4 in worst case. > http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch7pr= ogrunicode.htm#i1006452=20 > > > > >>> "Anthony Tuininga" <ant...@gm...> 23.01.2007 18:59 >>> > On 1/23/07, matilda matilda <ma...@gr...> wrote: > > > As far as I know, sqlplus does really double the internal buffers > > to do the communication for the following szenario: > > DB character set is Latin1 =3D> I can store a maximum of 4000 > > German umlauts to the a string column. That results in 8000 > > bytes on the client side if there is a multibyte encoding enabled. > > Hmm, that's worth trying, I suppose. Would you be able to provide a > small test case that demonstrates the problem? Something that has 4000 > characters in it which will actually result in considerably larger > number of bytes. I can then try that and see if your assumption is > correct. If it is, then the matter is fairly simple to address. > > > -------------------------------------------------------------------------= > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share = your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV=20 > _______________________________________________ > cx-oracle-users mailing list > cx-...@li...=20 > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users=20 > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV=20 _______________________________________________ cx-oracle-users mailing list cx-...@li...=20 https://lists.sourceforge.net/lists/listinfo/cx-oracle-users |