Thread: [cx-oracle-users] How to optimize (Pro*C is 5x faster)?

Brought to you by: atuining

cx-oracle-users

[cx-oracle-users] How to optimize (Pro*C is 5x faster)?

From: Christopher J. B. <cjb...@al...> - 2004-12-31 08:38:10

I have some code structured like this:

curs.execute(some_sql)
rows.fetchmany()
while rows:
        for row in rows:
                compiled_curs.execute(None, row[0], row[2])
                more_rows = compiled_curs.fetchmany()
                while more_rows:
                        #do something with them
                        more_rows = compiled_curs.fetchmany()
        rows = curs.fetchmany()
# arraysize is set to 3000 for both cursors

The problem is that it is so slow.  Yes, I realize that its not good to
execute sql in a large for loop, but my problem is that this code is
something like 5x as slow as the corresponding Pro*C code.

Whats going on?  Am I just not using the cx_Oracle module efficiently?  Why
is embedding sql in Pro*C code *so* much faster.

Thanks for the help, I hate coding in Pro*C and C ever since I learned
Python...please tell me I'm doing something wrong, I'd hate to go back to
C!

P.S.  I'm using cx_Oracle-4.1-beta1.

Re: [cx-oracle-users] How to optimize (Pro*C is 5x faster)?

From: Anthony T. <an...@co...> - 2005-01-04 15:07:33

A few points:

a) C is always going to be faster than Python when coded using the same 
algorithms and data structures; in general the advantage of Python is 
the ability to use advanced algorithms and data structures with little 
effort compared to writing the same code in C; I've noted that there is 
about a 10-15% performance penalty for writing code in Python for the 
stuff I've written (although clearly it depends on what has been 
written) but I've also noted that due to the ability to write more 
advanced algorithms in Python some code I originally wrote in C++ was 
actually slower than the original code written in C/C++.

b) The code "compiled_curs.execute(None, row[0], row[2]) is not valid 
since the execute() method only accepts one argument (a list of 
parameters) or a set of keyword parameters; what is going on here??

c) Is there any reason you didn't use an iterator? Using the code "for 
row in curs:" is more efficient than using the method fetchmany() and is 
considerably easier to read as well. I've never checked the difference 
in performance, though -- the timeit module is your friend.... :-)

d) Are you actually doing anything with the rows from the first cursor 
other than using them as input to the second cursor? If so, you'd be far 
better off to create a view that put those two queries together -- the 
performance increase will be quite significant.

e) If you really need performance and don't mind deviating from the DB 
API to get it, there are a number of methods that can be used to avoid 
the overhead of creating result tuples and binding parameters. This will 
definitely bring the performance numbers more in line with the Pro*C 
numbers but at the cost of less portable code.

f) What kind of times are we talking about here? Are you saying that one 
method is 50 ms and the other is 10 ms? Or is it 50 minutes and 10 
minutes? In other words, is this an operation that is being performed 
many times? 40 ms is not a big deal if it happens a few times but a big 
problem if it happens a few million times, right?

So, to summarize: I need more information in order to determine whether 
this is a "real" performance problem or not. If you could give me some 
actual code that I can execute to perform my own testing, that would be 
great.

Christopher J. Bottaro wrote:
> I have some code structured like this:
> 
> curs.execute(some_sql)
> rows.fetchmany()
> while rows:
>         for row in rows:
>                 compiled_curs.execute(None, row[0], row[2])
>                 more_rows = compiled_curs.fetchmany()
>                 while more_rows:
>                         #do something with them
>                         more_rows = compiled_curs.fetchmany()
>         rows = curs.fetchmany()
> # arraysize is set to 3000 for both cursors
> 
> The problem is that it is so slow.  Yes, I realize that its not good to
> execute sql in a large for loop, but my problem is that this code is
> something like 5x as slow as the corresponding Pro*C code.
> 
> Whats going on?  Am I just not using the cx_Oracle module efficiently?  Why
> is embedding sql in Pro*C code *so* much faster.
> 
> Thanks for the help, I hate coding in Pro*C and C ever since I learned
> Python...please tell me I'm doing something wrong, I'd hate to go back to
> C!
> 
> P.S.  I'm using cx_Oracle-4.1-beta1.
> 
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users

-- 
Anthony Tuininga
an...@co...

Computronix
Distinctive Software. Real People.
Suite 200, 10216 - 124 Street NW
Edmonton, AB, Canada  T5N 4A3
Phone:	(780) 454-3700
Fax:	(780) 454-3838
http://www.computronix.com

Re: [cx-oracle-users] How to optimize (Pro*C is 5x faster)?

From: Jim B. <ma...@ji...> - 2005-01-04 22:10:45

Anthony,

I'm curious about your point (e). Would this in particular be 
executemanyprepared()? And if so, are there any examples of its use? 
There was some discussion about it 
(http://mail.python.org/pipermail/db-sig/2004-April/004026.html), but no 
actual examples. Instead the discussion veered off to the use of 
external tables, but for me, external tables have too many DBA issues to 
actually use.

- Jim

Anthony Tuininga wrote:

> A few points:
>
> ...
>
> e) If you really need performance and don't mind deviating from the DB 
> API to get it, there are a number of methods that can be used to avoid 
> the overhead of creating result tuples and binding parameters. This 
> will definitely bring the performance numbers more in line with the 
> Pro*C numbers but at the cost of less portable code.

Re: [cx-oracle-users] How to optimize (Pro*C is 5x faster)?

From: Anthony T. <an...@co...> - 2005-01-05 19:44:44

The only examples are to be found in CopyData.py which is part of the 
cx_OracleTools package found at http://starship.python.net/crew/atuining.

The documentation for its use (albeit sparse) is to be found in the 
online documentation found at the same web site or included in the 
cx_Oracle package. If you have some particular questions about it, feel 
free to ask. I'll see whether I can explain it better.

Jim Baker wrote:
> Anthony,
> 
> I'm curious about your point (e). Would this in particular be 
> executemanyprepared()? And if so, are there any examples of its use? 
> There was some discussion about it 
> (http://mail.python.org/pipermail/db-sig/2004-April/004026.html), but no 
> actual examples. Instead the discussion veered off to the use of 
> external tables, but for me, external tables have too many DBA issues to 
> actually use.
> 
> - Jim
> 
> 
> Anthony Tuininga wrote:
> 
>> A few points:
>>
>> ...
>>
>> e) If you really need performance and don't mind deviating from the DB 
>> API to get it, there are a number of methods that can be used to avoid 
>> the overhead of creating result tuples and binding parameters. This 
>> will definitely bring the performance numbers more in line with the 
>> Pro*C numbers but at the cost of less portable code.
> 
> 
> 
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users

-- 
Anthony Tuininga
an...@co...

Computronix
Distinctive Software. Real People.
Suite 200, 10216 - 124 Street NW
Edmonton, AB, Canada  T5N 4A3
Phone:	(780) 454-3700
Fax:	(780) 454-3838
http://www.computronix.com

[cx-oracle-users] Re: How to optimize (Pro*C is 5x faster)?

From: Christopher J. B. <cjb...@al...> - 2005-01-05 04:56:39

Anthony Tuininga wrote:

> a) I've noted that there is
> about a 10-15% performance penalty for writing code in Python for the
> stuff I've written

Right, that is expected and acceptable but a 100% much less a 500% penalty
leads me to believe I'm doing something wrong.

> b) The code "compiled_curs.execute(None, row[0], row[2]) is not valid
> since the execute() method only accepts one argument (a list of
> parameters) or a set of keyword parameters; what is going on here??

Yeah sorry, I wrote that code snippet from memory.  It was originally
suppose to be pseudocode, but since Python is said to be executable
pseudocode, it was just easier to write it in Python since thats what I'm
current used to...;)

> c) Is there any reason you didn't use an iterator? Using the code "for
> row in curs:" is more efficient than using the method fetchmany() and is
> considerably easier to read as well. 

I didn't know you could do that, infact I made a generator so I could say
"for row in gen(curs):".

> d) Are you actually doing anything with the rows from the first cursor
> other than using them as input to the second cursor? If so, you'd be far
> better off to create a view that put those two queries together -- the
> performance increase will be quite significant.

I don't have the code in front of me and I wrote it almost a month ago, but
I think the results are used in file output as well as input to the second
cursor...among other things as well.

I actually know very little about databases, I'll have to research this
"view" concept.

> e) If you really need performance and don't mind deviating from the DB
> API to get it, there are a number of methods that can be used to avoid
> the overhead of creating result tuples and binding parameters. This will
> definitely bring the performance numbers more in line with the Pro*C
> numbers but at the cost of less portable code.

Where can I find documentation on this?

> f) What kind of times are we talking about here? Are you saying that one
> method is 50 ms and the other is 10 ms? Or is it 50 minutes and 10
> minutes? In other words, is this an operation that is being performed
> many times? 40 ms is not a big deal if it happens a few times but a big
> problem if it happens a few million times, right?

Our programs run from 15 mins to hours.  In our development environment,
they run for a little over a minute, where as the Pro*C runs in less than
10 seconds.  I've heard in the production environment, the Pro*C can take
up to 30 mins and as we expand our stores, we can expect times to get over
an hour.

> So, to summarize: I need more information in order to determine whether
> this is a "real" performance problem or not. If you could give me some
> actual code that I can execute to perform my own testing, that would be
> great.

I don't really know what else I could besides give you the actual code, but
its so specific to our system and database, I don't know how you could run
it.

The general algorithms are the same for both the Pro*C and Python
code...thats what is confusing me.  The Python code is actually a little
more stripped down.  I'm almost positive its something on my end thats
causing the problem.  When I have some time, I'm going to rewrite the Pro*C
to exactly mimic the Python code, run some tests, and then get back to you.

Thank you for your reply, now I have a few leads to go on.

> Christopher J. Bottaro wrote:
>> I have some code structured like this:
>> 
>> curs.execute(some_sql)
>> rows.fetchmany()
>> while rows:
>>         for row in rows:
>>                 compiled_curs.execute(None, row[0], row[2])
>>                 more_rows = compiled_curs.fetchmany()
>>                 while more_rows:
>>                         #do something with them
>>                         more_rows = compiled_curs.fetchmany()
>>         rows = curs.fetchmany()
>> # arraysize is set to 3000 for both cursors
>> 
>> The problem is that it is so slow.  Yes, I realize that its not good to
>> execute sql in a large for loop, but my problem is that this code is
>> something like 5x as slow as the corresponding Pro*C code.
>> 
>> Whats going on?  Am I just not using the cx_Oracle module efficiently? 
>> Why is embedding sql in Pro*C code *so* much faster.
>> 
>> Thanks for the help, I hate coding in Pro*C and C ever since I learned
>> Python...please tell me I'm doing something wrong, I'd hate to go back to
>> C!
>> 
>> P.S.  I'm using cx_Oracle-4.1-beta1.
>> 
>> 
>> 
>> -------------------------------------------------------
>> The SF.Net email is sponsored by: Beat the post-holiday blues
>> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>> _______________________________________________
>> cx-oracle-users mailing list
>> cx-...@li...
>> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>

Re: [cx-oracle-users] Re: How to optimize (Pro*C is 5x faster)?

From: Anthony T. <an...@co...> - 2005-01-05 19:50:01

Christopher J. Bottaro wrote:
> Anthony Tuininga wrote:
> 
> 
>>a) I've noted that there is
>>about a 10-15% performance penalty for writing code in Python for the
>>stuff I've written
> 
> 
> Right, that is expected and acceptable but a 100% much less a 500% penalty
> leads me to believe I'm doing something wrong.

Agreed. Be aware, however, that the overhead of fetching (creating 
tuples) and binding parameters (creating dictionaries) can be as much as 
300% in certain cases. I know this from my writing of CopyData which 
originally was about 3x slower than the original written in C++ but 
after writing executemanyprepared() and fetchraw() and the like I was 
able to get it to perform better than the C++ version.

>>b) The code "compiled_curs.execute(None, row[0], row[2]) is not valid
>>since the execute() method only accepts one argument (a list of
>>parameters) or a set of keyword parameters; what is going on here??
> 
> 
> Yeah sorry, I wrote that code snippet from memory.  It was originally
> suppose to be pseudocode, but since Python is said to be executable
> pseudocode, it was just easier to write it in Python since thats what I'm
> current used to...;)

Understood. :-)

>>c) Is there any reason you didn't use an iterator? Using the code "for
>>row in curs:" is more efficient than using the method fetchmany() and is
>>considerably easier to read as well. 
> 
> 
> I didn't know you could do that, infact I made a generator so I could say
> "for row in gen(curs):".

Sure. The generator will definitely be slower so you'll gain something 
here. :-)

>>d) Are you actually doing anything with the rows from the first cursor
>>other than using them as input to the second cursor? If so, you'd be far
>>better off to create a view that put those two queries together -- the
>>performance increase will be quite significant.
> 
> 
> I don't have the code in front of me and I wrote it almost a month ago, but
> I think the results are used in file output as well as input to the second
> cursor...among other things as well.
> 
> I actually know very little about databases, I'll have to research this
> "view" concept.

Good idea. :-)

>>e) If you really need performance and don't mind deviating from the DB
>>API to get it, there are a number of methods that can be used to avoid
>>the overhead of creating result tuples and binding parameters. This will
>>definitely bring the performance numbers more in line with the Pro*C
>>numbers but at the cost of less portable code.
> 
> 
> Where can I find documentation on this?

The documentation is included in the documentation for cx_Oracle with 
big caveats posted indicating that it deviates from the DB API. The 
documentation might be a little sparse, though, so you might want to 
acquire cx_OracleTools and look at CopyData.py and others of that nature 
to see how I used it myself. If you have further questions, feel free to 
ask.

>>f) What kind of times are we talking about here? Are you saying that one
>>method is 50 ms and the other is 10 ms? Or is it 50 minutes and 10
>>minutes? In other words, is this an operation that is being performed
>>many times? 40 ms is not a big deal if it happens a few times but a big
>>problem if it happens a few million times, right?
> 
> 
> Our programs run from 15 mins to hours.  In our development environment,
> they run for a little over a minute, where as the Pro*C runs in less than
> 10 seconds.  I've heard in the production environment, the Pro*C can take
> up to 30 mins and as we expand our stores, we can expect times to get over
> an hour.

Ok.

>>So, to summarize: I need more information in order to determine whether
>>this is a "real" performance problem or not. If you could give me some
>>actual code that I can execute to perform my own testing, that would be
>>great.
> 
> 
> I don't really know what else I could besides give you the actual code, but
> its so specific to our system and database, I don't know how you could run
> it.

That's usually the issue. You would have to provide an example that was 
similar to your code but that I could build and run myself.

> The general algorithms are the same for both the Pro*C and Python
> code...thats what is confusing me.  The Python code is actually a little
> more stripped down.  I'm almost positive its something on my end thats
> causing the problem.  When I have some time, I'm going to rewrite the Pro*C
> to exactly mimic the Python code, run some tests, and then get back to you.

Sounds good.

> Thank you for your reply, now I have a few leads to go on.

You're welcome.

>>Christopher J. Bottaro wrote:
>>
>>>I have some code structured like this:
>>>
>>>curs.execute(some_sql)
>>>rows.fetchmany()
>>>while rows:
>>>        for row in rows:
>>>                compiled_curs.execute(None, row[0], row[2])
>>>                more_rows = compiled_curs.fetchmany()
>>>                while more_rows:
>>>                        #do something with them
>>>                        more_rows = compiled_curs.fetchmany()
>>>        rows = curs.fetchmany()
>>># arraysize is set to 3000 for both cursors
>>>
>>>The problem is that it is so slow.  Yes, I realize that its not good to
>>>execute sql in a large for loop, but my problem is that this code is
>>>something like 5x as slow as the corresponding Pro*C code.
>>>
>>>Whats going on?  Am I just not using the cx_Oracle module efficiently? 
>>>Why is embedding sql in Pro*C code *so* much faster.
>>>
>>>Thanks for the help, I hate coding in Pro*C and C ever since I learned
>>>Python...please tell me I'm doing something wrong, I'd hate to go back to
>>>C!
>>>
>>>P.S.  I'm using cx_Oracle-4.1-beta1.
>>>
>>>
>>>
>>>-------------------------------------------------------
>>>The SF.Net email is sponsored by: Beat the post-holiday blues
>>>Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>>>It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>>>_______________________________________________
>>>cx-oracle-users mailing list
>>>cx-...@li...
>>>https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>>
> 
> 
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users

-- 
Anthony Tuininga
an...@co...

Computronix
Distinctive Software. Real People.
Suite 200, 10216 - 124 Street NW
Edmonton, AB, Canada  T5N 4A3
Phone:	(780) 454-3700
Fax:	(780) 454-3838
http://www.computronix.com