Thread: [cx-oracle-users] CLOB performance issue
Brought to you by:
atuining
From: Vladimir R. <gre...@gm...> - 2014-06-25 13:58:27
|
Hi. I have encountered a performance problem when inserting values into a column of CLOB datatype. For example I have a table: create table table_clob (msg_id number, message clob) Here is my code to reproduce the issue: ----- import time import cx_Oracle import sys print('Python', sys.version, 'on', sys.platform) print('cx_Oracle.version:', cx_Oracle.version) num_records = 100 con = cx_Oracle.connect('user/password@sid') print('DB version:', con.version) cur = con.cursor() cur.prepare("insert into table_clob (msg_id, message) values (:msg_id, :msg)") cur.bindarraysize = num_records msg_arr = cur.var(cx_Oracle.CLOB, arraysize=num_records) text = '$'*2**20 # 1 MB of text rows = [] # Prepare data. start_time = time.perf_counter() for id in range(num_records): msg_arr.setvalue(id, text) rows.append( (id, msg_arr) ) # Insert data. print('{} records prepared, {:.3f} s' .format(num_records, time.perf_counter() - start_time)) start_time = time.perf_counter() cur.executemany(None, rows) con.commit() print('{} records inserted, {:.3f} s' .format(num_records, time.perf_counter() - start_time)) cur.close() con.close() ----- which yields the results: ----- Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32 cx_Oracle.version: 5.1.3 DB version: 11.2.0.1.0 100 records prepared, 5.384 s 100 records inserted, 4.536 s ----- Versions of my DB and my client are the same. What's worrying me is the time of preparation of msg_arr structure. 5 seconds is abnormally much for copying of 100 MB in memory. 4.5 seconds seems OK for sending it over the network. The problematic step is msg_arr.setvalue(id, text). If I comment it, script takes just milliseconds to complete (inserting null into CLOB column of course). I also noticed a persistent disk activity during the "prepare" stage. Perhaps cx_Oracle library stores some data in temporary files on disk? This could be the reason of the problem... I wonder if there is a way to improve performance of "preparation" part of my code. Thanks in advance for your help. |
From: avinash n. <avi...@or...> - 2014-07-14 12:35:56
|
Hello Vladimir, It looks like in the current code the variable.setvalue() function for LOBs uses OCILobTemporary() which is a call to the server. 100 MB would be transferred to the server during this call. Hence the call appears to take more time at the time of filling. Subsequent execute() will just be adjusting the descriptors in the table in the db. End to end there should not be any additional processing. Best Regards, Avinash -------- Original Message -------- > Subject: [cx-oracle-users] CLOB performance issue > Date: Wed, 25 Jun 2014 17:57:58 +0400 > From: Vladimir Ryabtsev <gre...@gm...> > Reply-To: cx-...@li... > To: cx-...@li... > > > > Hi. > > I have encountered a performance problem when inserting values into a > column of CLOB datatype. For example I have a table: > > create table table_clob (msg_id number, message clob) > > Here is my code to reproduce the issue: > > ----- > import time > import cx_Oracle > import sys > > print('Python', sys.version, 'on', sys.platform) > print('cx_Oracle.version:', cx_Oracle.version) > num_records = 100 > con = cx_Oracle.connect('user/password@sid') > print('DB version:', con.version) > cur = con.cursor() > cur.prepare("insert into table_clob (msg_id, message) values (:msg_id, > :msg)") > cur.bindarraysize = num_records > msg_arr = cur.var(cx_Oracle.CLOB, arraysize=num_records) > text = '$'*2**20 Â Â # 1 MB of text > rows = [] > > # Prepare data. > start_time = time.perf_counter() > for id in range(num_records): > Â Â msg_arr.setvalue(id, text) > Â Â rows.append( (id, msg_arr) ) > > # Insert data. > print('{} records prepared, {:.3f} s' > Â Â Â .format(num_records, time.perf_counter() - start_time)) > start_time = time.perf_counter() > cur.executemany(None, rows) > con.commit() > print('{} records inserted, {:.3f} s' > Â Â Â .format(num_records, time.perf_counter() - start_time)) > > cur.close() > con.close() > ----- > > which yields the results: > > ----- > Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 > 64 bit (AMD64)] on win32 > cx_Oracle.version: 5.1.3 > DB version: 11.2.0.1.0 > 100 records prepared, 5.384 s > 100 records inserted, 4.536 s > ----- > > Versions of my DB and my client are the same. > > What's worrying me is the time of preparation of msg_arr structure. 5 > seconds is abnormally much for copying of 100 MB in memory. 4.5 > seconds seems OK for sending it over the network. > The problematic step is msg_arr.setvalue(id, text). If I comment it, > script takes just milliseconds to complete (inserting null into CLOB > column of course). > I also noticed a persistent disk activity during the "prepare" stage. > Perhaps cx_Oracle library stores some data in temporary files on disk? > This could be the reason of the problem... > > I wonder if there is a way to improve performance of "preparation" > part of my code. > > Thanks in advance for your help. > > |
From: Vladimir R. <gre...@gm...> - 2014-07-15 21:00:42
|
2014-07-14 16:35 GMT+04:00 avinash nandakumar <avi...@or... >: > It looks like in the current code the variable.setvalue() function for > LOBs uses OCILobTemporary() which is a call to the server. > 100 MB would be transferred to the server during this call. Hence the > call appears to take more time at the time of filling. > Subsequent execute() will just be adjusting the descriptors in the table > in the db. > End to end there should not be any additional processing. > > Best Regards, > Avinash > Hello Avinash, is it really necessary to call OCILobTemporary() in OCI level? Do you consider implementation of cx_Oracle correct? If I write my code in Java (for example) it works much faster, that's why I suppose the problem is in cx_Oracle. I see the difference in performance only in case I work CLOBs, performance is comparable if I use simple data types. |
From: avinash n. <avi...@or...> - 2014-07-18 09:06:50
|
Hello Vladimir, There is another way to insert data into the table which does not require you to call setvalue(). The way is first you insert an empty CLOB into the table. Then you fetch the CLOB variable for update and write data into the CLOB. I can see better performance in this case. I have demonstrated the usage below:- / //#create table table_clob ( msg_id number, message CLOB)/ /import cx_Oracle// //import time// //con = cx_Oracle.connect("scott/tiger@inst1")// // //num_records = 100// //cur = con.cursor()// //text = '$'*2**20// //rows =[]// // //start_time = time.time()// // //for id in range(num_records):// // rows.append([id+1])// // //cur.bindarraysize = num_records// //cur.executemany("insert into table_clob (msg_id, message) values (:1, empty_clob())", rows)// // //selstmt = "select message from table_clob where msg_id between 1 AND :1 for update"// //cur.execute(selstmt, [num_records])// //for id in range(num_records):// // results = cur.fetchone()// // results[0].write(text)// // //con.commit()// // //end_time = time.time()// //print end_time -start_time// //cur.close()// //con.close()/ I hope this will help you solve your performance problem. Best Regards, Avinash On 7/16/2014 2:30 AM, Vladimir Ryabtsev wrote: > > 2014-07-14 16:35 GMT+04:00 avinash nandakumar > <avi...@or... <mailto:avi...@or...>>: > > It looks like in the current code the variable.setvalue() function > for LOBs uses OCILobTemporary() which is a call to the server. > 100 MB would be transferred to the server during this call. Hence > the call appears to take more time at the time of filling. > Subsequent execute() will just be adjusting the descriptors in the > table in the db. > End to end there should not be any additional processing. > > Best Regards, > Avinash > > > Hello Avinash, > > is it really necessary to call OCILobTemporary() in OCI level? Do you > consider implementation of cx_Oracle correct? > > If I write my code in Java (for example) it works much faster, that's > why I suppose the problem is in cx_Oracle. I see the difference in > performance only in case I work CLOBs, performance is comparable if I > use simple data types. > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > > > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users |
From: Vladimir R. <gre...@gm...> - 2014-07-22 14:25:07
|
Hello avinash, Very weird workaround but it is really solution! It works 60% faster than my variant. Thanks. But it seems, in general, this way has some disadvantages: - you must be able to select just inserted records fast (i.e. you must have indexes) otherwise performance advantage will be lost; - extra locks on table records; - more complex logic of program. Although this points may not be essential, this is actually a sort of trick trick and I believe authors should somehow fix the library to make it usage more straighforward. 2014-07-18 13:06 GMT+04:00 avinash nandakumar <avi...@or... >: > There is another way to insert data into the table which does not require > you to call setvalue(). > The way is first you insert an empty CLOB into the table. Then you fetch > the CLOB variable for update and write data into the CLOB. > I can see better performance in this case. > I have demonstrated the usage below:- > > *#create table table_clob ( msg_id number, message CLOB)* > > *import cx_Oracle* > *import time* > *con = cx_Oracle.connect("scott/tiger@inst1")* > > *num_records = 100* > *cur = con.cursor()* > *text = '* > **2**20* > *rows =[]* > > *start_time = time.time()* > > *for id in range(num_records):* > * rows.append([id+1])* > > *cur.bindarraysize = num_records* > *cur.executemany("insert into table_clob (msg_id, message) values (:1, > empty_clob())", rows)* > > *selstmt = "select message from table_clob where msg_id between 1 AND :1 > for update"* > *cur.execute(selstmt, [num_records])* > *for id in range(num_records):* > * results = cur.fetchone()* > * results[0].write(text)* > > *con.commit()* > > *end_time = time.time()* > *print end_time -start_time* > *cur.close()* > *con.close()* |