Thread: [cx-oracle-users] How to optimize (Pro*C is 5x faster)?
Brought to you by:
atuining
From: Christopher J. B. <cjb...@al...> - 2004-12-31 08:38:10
|
I have some code structured like this: curs.execute(some_sql) rows.fetchmany() while rows: for row in rows: compiled_curs.execute(None, row[0], row[2]) more_rows = compiled_curs.fetchmany() while more_rows: #do something with them more_rows = compiled_curs.fetchmany() rows = curs.fetchmany() # arraysize is set to 3000 for both cursors The problem is that it is so slow. Yes, I realize that its not good to execute sql in a large for loop, but my problem is that this code is something like 5x as slow as the corresponding Pro*C code. Whats going on? Am I just not using the cx_Oracle module efficiently? Why is embedding sql in Pro*C code *so* much faster. Thanks for the help, I hate coding in Pro*C and C ever since I learned Python...please tell me I'm doing something wrong, I'd hate to go back to C! P.S. I'm using cx_Oracle-4.1-beta1. |
From: Anthony T. <an...@co...> - 2005-01-04 15:07:33
|
A few points: a) C is always going to be faster than Python when coded using the same algorithms and data structures; in general the advantage of Python is the ability to use advanced algorithms and data structures with little effort compared to writing the same code in C; I've noted that there is about a 10-15% performance penalty for writing code in Python for the stuff I've written (although clearly it depends on what has been written) but I've also noted that due to the ability to write more advanced algorithms in Python some code I originally wrote in C++ was actually slower than the original code written in C/C++. b) The code "compiled_curs.execute(None, row[0], row[2]) is not valid since the execute() method only accepts one argument (a list of parameters) or a set of keyword parameters; what is going on here?? c) Is there any reason you didn't use an iterator? Using the code "for row in curs:" is more efficient than using the method fetchmany() and is considerably easier to read as well. I've never checked the difference in performance, though -- the timeit module is your friend.... :-) d) Are you actually doing anything with the rows from the first cursor other than using them as input to the second cursor? If so, you'd be far better off to create a view that put those two queries together -- the performance increase will be quite significant. e) If you really need performance and don't mind deviating from the DB API to get it, there are a number of methods that can be used to avoid the overhead of creating result tuples and binding parameters. This will definitely bring the performance numbers more in line with the Pro*C numbers but at the cost of less portable code. f) What kind of times are we talking about here? Are you saying that one method is 50 ms and the other is 10 ms? Or is it 50 minutes and 10 minutes? In other words, is this an operation that is being performed many times? 40 ms is not a big deal if it happens a few times but a big problem if it happens a few million times, right? So, to summarize: I need more information in order to determine whether this is a "real" performance problem or not. If you could give me some actual code that I can execute to perform my own testing, that would be great. Christopher J. Bottaro wrote: > I have some code structured like this: > > curs.execute(some_sql) > rows.fetchmany() > while rows: > for row in rows: > compiled_curs.execute(None, row[0], row[2]) > more_rows = compiled_curs.fetchmany() > while more_rows: > #do something with them > more_rows = compiled_curs.fetchmany() > rows = curs.fetchmany() > # arraysize is set to 3000 for both cursors > > The problem is that it is so slow. Yes, I realize that its not good to > execute sql in a large for loop, but my problem is that this code is > something like 5x as slow as the corresponding Pro*C code. > > Whats going on? Am I just not using the cx_Oracle module efficiently? Why > is embedding sql in Pro*C code *so* much faster. > > Thanks for the help, I hate coding in Pro*C and C ever since I learned > Python...please tell me I'm doing something wrong, I'd hate to go back to > C! > > P.S. I'm using cx_Oracle-4.1-beta1. > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users -- Anthony Tuininga an...@co... Computronix Distinctive Software. Real People. Suite 200, 10216 - 124 Street NW Edmonton, AB, Canada T5N 4A3 Phone: (780) 454-3700 Fax: (780) 454-3838 http://www.computronix.com |
From: Jim B. <ma...@ji...> - 2005-01-04 22:10:45
|
Anthony, I'm curious about your point (e). Would this in particular be executemanyprepared()? And if so, are there any examples of its use? There was some discussion about it (http://mail.python.org/pipermail/db-sig/2004-April/004026.html), but no actual examples. Instead the discussion veered off to the use of external tables, but for me, external tables have too many DBA issues to actually use. - Jim Anthony Tuininga wrote: > A few points: > > ... > > e) If you really need performance and don't mind deviating from the DB > API to get it, there are a number of methods that can be used to avoid > the overhead of creating result tuples and binding parameters. This > will definitely bring the performance numbers more in line with the > Pro*C numbers but at the cost of less portable code. |
From: Anthony T. <an...@co...> - 2005-01-05 19:44:44
|
The only examples are to be found in CopyData.py which is part of the cx_OracleTools package found at http://starship.python.net/crew/atuining. The documentation for its use (albeit sparse) is to be found in the online documentation found at the same web site or included in the cx_Oracle package. If you have some particular questions about it, feel free to ask. I'll see whether I can explain it better. Jim Baker wrote: > Anthony, > > I'm curious about your point (e). Would this in particular be > executemanyprepared()? And if so, are there any examples of its use? > There was some discussion about it > (http://mail.python.org/pipermail/db-sig/2004-April/004026.html), but no > actual examples. Instead the discussion veered off to the use of > external tables, but for me, external tables have too many DBA issues to > actually use. > > - Jim > > > Anthony Tuininga wrote: > >> A few points: >> >> ... >> >> e) If you really need performance and don't mind deviating from the DB >> API to get it, there are a number of methods that can be used to avoid >> the overhead of creating result tuples and binding parameters. This >> will definitely bring the performance numbers more in line with the >> Pro*C numbers but at the cost of less portable code. > > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users -- Anthony Tuininga an...@co... Computronix Distinctive Software. Real People. Suite 200, 10216 - 124 Street NW Edmonton, AB, Canada T5N 4A3 Phone: (780) 454-3700 Fax: (780) 454-3838 http://www.computronix.com |
From: Christopher J. B. <cjb...@al...> - 2005-01-05 04:56:39
|
Anthony Tuininga wrote: > a) I've noted that there is > about a 10-15% performance penalty for writing code in Python for the > stuff I've written Right, that is expected and acceptable but a 100% much less a 500% penalty leads me to believe I'm doing something wrong. > b) The code "compiled_curs.execute(None, row[0], row[2]) is not valid > since the execute() method only accepts one argument (a list of > parameters) or a set of keyword parameters; what is going on here?? Yeah sorry, I wrote that code snippet from memory. It was originally suppose to be pseudocode, but since Python is said to be executable pseudocode, it was just easier to write it in Python since thats what I'm current used to...;) > c) Is there any reason you didn't use an iterator? Using the code "for > row in curs:" is more efficient than using the method fetchmany() and is > considerably easier to read as well. I didn't know you could do that, infact I made a generator so I could say "for row in gen(curs):". > d) Are you actually doing anything with the rows from the first cursor > other than using them as input to the second cursor? If so, you'd be far > better off to create a view that put those two queries together -- the > performance increase will be quite significant. I don't have the code in front of me and I wrote it almost a month ago, but I think the results are used in file output as well as input to the second cursor...among other things as well. I actually know very little about databases, I'll have to research this "view" concept. > e) If you really need performance and don't mind deviating from the DB > API to get it, there are a number of methods that can be used to avoid > the overhead of creating result tuples and binding parameters. This will > definitely bring the performance numbers more in line with the Pro*C > numbers but at the cost of less portable code. Where can I find documentation on this? > f) What kind of times are we talking about here? Are you saying that one > method is 50 ms and the other is 10 ms? Or is it 50 minutes and 10 > minutes? In other words, is this an operation that is being performed > many times? 40 ms is not a big deal if it happens a few times but a big > problem if it happens a few million times, right? Our programs run from 15 mins to hours. In our development environment, they run for a little over a minute, where as the Pro*C runs in less than 10 seconds. I've heard in the production environment, the Pro*C can take up to 30 mins and as we expand our stores, we can expect times to get over an hour. > So, to summarize: I need more information in order to determine whether > this is a "real" performance problem or not. If you could give me some > actual code that I can execute to perform my own testing, that would be > great. I don't really know what else I could besides give you the actual code, but its so specific to our system and database, I don't know how you could run it. The general algorithms are the same for both the Pro*C and Python code...thats what is confusing me. The Python code is actually a little more stripped down. I'm almost positive its something on my end thats causing the problem. When I have some time, I'm going to rewrite the Pro*C to exactly mimic the Python code, run some tests, and then get back to you. Thank you for your reply, now I have a few leads to go on. > Christopher J. Bottaro wrote: >> I have some code structured like this: >> >> curs.execute(some_sql) >> rows.fetchmany() >> while rows: >> for row in rows: >> compiled_curs.execute(None, row[0], row[2]) >> more_rows = compiled_curs.fetchmany() >> while more_rows: >> #do something with them >> more_rows = compiled_curs.fetchmany() >> rows = curs.fetchmany() >> # arraysize is set to 3000 for both cursors >> >> The problem is that it is so slow. Yes, I realize that its not good to >> execute sql in a large for loop, but my problem is that this code is >> something like 5x as slow as the corresponding Pro*C code. >> >> Whats going on? Am I just not using the cx_Oracle module efficiently? >> Why is embedding sql in Pro*C code *so* much faster. >> >> Thanks for the help, I hate coding in Pro*C and C ever since I learned >> Python...please tell me I'm doing something wrong, I'd hate to go back to >> C! >> >> P.S. I'm using cx_Oracle-4.1-beta1. >> >> >> >> ------------------------------------------------------- >> The SF.Net email is sponsored by: Beat the post-holiday blues >> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >> _______________________________________________ >> cx-oracle-users mailing list >> cx-...@li... >> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |
From: Anthony T. <an...@co...> - 2005-01-05 19:50:01
|
Christopher J. Bottaro wrote: > Anthony Tuininga wrote: > > >>a) I've noted that there is >>about a 10-15% performance penalty for writing code in Python for the >>stuff I've written > > > Right, that is expected and acceptable but a 100% much less a 500% penalty > leads me to believe I'm doing something wrong. Agreed. Be aware, however, that the overhead of fetching (creating tuples) and binding parameters (creating dictionaries) can be as much as 300% in certain cases. I know this from my writing of CopyData which originally was about 3x slower than the original written in C++ but after writing executemanyprepared() and fetchraw() and the like I was able to get it to perform better than the C++ version. >>b) The code "compiled_curs.execute(None, row[0], row[2]) is not valid >>since the execute() method only accepts one argument (a list of >>parameters) or a set of keyword parameters; what is going on here?? > > > Yeah sorry, I wrote that code snippet from memory. It was originally > suppose to be pseudocode, but since Python is said to be executable > pseudocode, it was just easier to write it in Python since thats what I'm > current used to...;) Understood. :-) >>c) Is there any reason you didn't use an iterator? Using the code "for >>row in curs:" is more efficient than using the method fetchmany() and is >>considerably easier to read as well. > > > I didn't know you could do that, infact I made a generator so I could say > "for row in gen(curs):". Sure. The generator will definitely be slower so you'll gain something here. :-) >>d) Are you actually doing anything with the rows from the first cursor >>other than using them as input to the second cursor? If so, you'd be far >>better off to create a view that put those two queries together -- the >>performance increase will be quite significant. > > > I don't have the code in front of me and I wrote it almost a month ago, but > I think the results are used in file output as well as input to the second > cursor...among other things as well. > > I actually know very little about databases, I'll have to research this > "view" concept. Good idea. :-) >>e) If you really need performance and don't mind deviating from the DB >>API to get it, there are a number of methods that can be used to avoid >>the overhead of creating result tuples and binding parameters. This will >>definitely bring the performance numbers more in line with the Pro*C >>numbers but at the cost of less portable code. > > > Where can I find documentation on this? The documentation is included in the documentation for cx_Oracle with big caveats posted indicating that it deviates from the DB API. The documentation might be a little sparse, though, so you might want to acquire cx_OracleTools and look at CopyData.py and others of that nature to see how I used it myself. If you have further questions, feel free to ask. >>f) What kind of times are we talking about here? Are you saying that one >>method is 50 ms and the other is 10 ms? Or is it 50 minutes and 10 >>minutes? In other words, is this an operation that is being performed >>many times? 40 ms is not a big deal if it happens a few times but a big >>problem if it happens a few million times, right? > > > Our programs run from 15 mins to hours. In our development environment, > they run for a little over a minute, where as the Pro*C runs in less than > 10 seconds. I've heard in the production environment, the Pro*C can take > up to 30 mins and as we expand our stores, we can expect times to get over > an hour. Ok. >>So, to summarize: I need more information in order to determine whether >>this is a "real" performance problem or not. If you could give me some >>actual code that I can execute to perform my own testing, that would be >>great. > > > I don't really know what else I could besides give you the actual code, but > its so specific to our system and database, I don't know how you could run > it. That's usually the issue. You would have to provide an example that was similar to your code but that I could build and run myself. > The general algorithms are the same for both the Pro*C and Python > code...thats what is confusing me. The Python code is actually a little > more stripped down. I'm almost positive its something on my end thats > causing the problem. When I have some time, I'm going to rewrite the Pro*C > to exactly mimic the Python code, run some tests, and then get back to you. Sounds good. > Thank you for your reply, now I have a few leads to go on. You're welcome. >>Christopher J. Bottaro wrote: >> >>>I have some code structured like this: >>> >>>curs.execute(some_sql) >>>rows.fetchmany() >>>while rows: >>> for row in rows: >>> compiled_curs.execute(None, row[0], row[2]) >>> more_rows = compiled_curs.fetchmany() >>> while more_rows: >>> #do something with them >>> more_rows = compiled_curs.fetchmany() >>> rows = curs.fetchmany() >>># arraysize is set to 3000 for both cursors >>> >>>The problem is that it is so slow. Yes, I realize that its not good to >>>execute sql in a large for loop, but my problem is that this code is >>>something like 5x as slow as the corresponding Pro*C code. >>> >>>Whats going on? Am I just not using the cx_Oracle module efficiently? >>>Why is embedding sql in Pro*C code *so* much faster. >>> >>>Thanks for the help, I hate coding in Pro*C and C ever since I learned >>>Python...please tell me I'm doing something wrong, I'd hate to go back to >>>C! >>> >>>P.S. I'm using cx_Oracle-4.1-beta1. >>> >>> >>> >>>------------------------------------------------------- >>>The SF.Net email is sponsored by: Beat the post-holiday blues >>>Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. >>>It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt >>>_______________________________________________ >>>cx-oracle-users mailing list >>>cx-...@li... >>>https://lists.sourceforge.net/lists/listinfo/cx-oracle-users >> > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users -- Anthony Tuininga an...@co... Computronix Distinctive Software. Real People. Suite 200, 10216 - 124 Street NW Edmonton, AB, Canada T5N 4A3 Phone: (780) 454-3700 Fax: (780) 454-3838 http://www.computronix.com |