Thread: [SQLObject] SQLObject Performance
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Colin S. <co...@ow...> - 2004-10-04 23:02:16
|
Hi, I've just started using SQLObject in a program that loads web server log information into a MySQL database. The program consists of a tight loop that reads a web server entry, parses the data, performs one SQL lookup, and then inserts the data into a table. When I use the MySQL libraries directly the program handles ~290 rec/s. When I use SQLObject to perform the lookup and SQL insert the rate drops to 60 rec/s. Is it unrealistic to use SQLObject for DB interaction when handling batch loads of data? I've done a quick profile of the code (top few calls below) and nothing jumps out as being particularly easy to optimise... ncalls tottime percall cumtime percall filename:lineno(function) 15308/14042 2.500 0.000 4.430 0.000 converters.py:179(sqlrepr) 1210 2.310 0.002 3.830 0.003 main.py:755(set) 3630 2.140 0.001 3.700 0.001 cursors.py:166(__do_query) 14042 1.370 0.000 5.800 0.000 dbconnection.py:438(sqlrepr) 23244 1.240 0.000 1.240 0.000 main.py:1176(instanceName) 3419 0.830 0.000 15.820 0.005 dbconnection.py:114(_runWithConnection) 14253 0.810 0.000 0.810 0.000 converters.py:81(lookupConverter) 1210 0.800 0.001 1.510 0.001 main.py:812(_SO_selectInit) Colin. |
From: Ian B. <ia...@co...> - 2004-10-05 04:12:47
|
Colin Stewart wrote: > Hi, > > I've just started using SQLObject in a program that loads web server log > information into a MySQL database. The program consists of a tight loop > that reads a web server entry, parses the data, performs one SQL lookup, > and then inserts the data into a table. > > When I use the MySQL libraries directly the program handles ~290 rec/s. > When I use SQLObject to perform the lookup and SQL insert the rate drops > to 60 rec/s. For that use case, it's hard to say. Really the value of an ORM decreases when you are dealing with large datasets, especially when dealing with that data as a collection, like you would with web server logs. You're really going to want to deal with that data inside MySQL, not in Python -- e.g., to get a hit count, you'll want to run the appropriate SQL command, not load the rows and count them in Python. You can do a count in SQLObject, but there's lots of aggregate functions that you can't do, so you'll hit a wall. > Is it unrealistic to use SQLObject for DB interaction when handling > batch loads of data? I've done a quick profile of the code (top few > calls below) and nothing jumps out as being particularly easy to optimise... > > ncalls tottime percall cumtime percall filename:lineno(function) > 15308/14042 2.500 0.000 4.430 0.000 converters.py:179(sqlrepr) It's interesting that sqlrepr is at the top. I'll have to think about how I'm using it. It's also been suggested that SQLObject rely on the database driver's quoting instead of doing its own. This may lend more weight to that opinion. > 1210 2.310 0.002 3.830 0.003 main.py:755(set) > 3630 2.140 0.001 3.700 0.001 cursors.py:166(__do_query) > 14042 1.370 0.000 5.800 0.000 dbconnection.py:438(sqlrepr) > 23244 1.240 0.000 1.240 0.000 main.py:1176(instanceName) > 3419 0.830 0.000 15.820 0.005 dbconnection.py:114(_runWithConnection) > 14253 0.810 0.000 0.810 0.000 converters.py:81(lookupConverter) > 1210 0.800 0.001 1.510 0.001 main.py:812(_SO_selectInit) -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Colin S. <co...@ow...> - 2004-10-06 22:13:13
|
Hi Ian, > > I've just started using SQLObject in a program that loads web server log > > information into a MySQL database. The program consists of a tight loop > > that reads a web server entry, parses the data, performs one SQL lookup, > > and then inserts the data into a table. > > > > When I use the MySQL libraries directly the program handles ~290 rec/s. > > When I use SQLObject to perform the lookup and SQL insert the rate drops > > to 60 rec/s. > > For that use case, it's hard to say. Really the value of an ORM > decreases when you are dealing with large datasets, especially when > dealing with that data as a collection, like you would with web server > logs. You're really going to want to deal with that data inside MySQL, > not in Python -- e.g., to get a hit count, you'll want to run the > appropriate SQL command, not load the rows and count them in Python. > You can do a count in SQLObject, but there's lots of aggregate functions > that you can't do, so you'll hit a wall. Is it safe to use the DBAPI.getConnection() call to get the underlying connection and use this mixed in with ordinary SQLObject calls? It would be nice to be able to mix raw SQL and SQLObject as needed over the same connection. I'm not sure how SQLObject's caching works, but there's presumably a mechanism for invalidating results from individual tables? > > Is it unrealistic to use SQLObject for DB interaction when handling > > batch loads of data? I've done a quick profile of the code (top few > > calls below) and nothing jumps out as being particularly easy to optimise... > > > > ncalls tottime percall cumtime percall filename:lineno(function) > > 15308/14042 2.500 0.000 4.430 0.000 converters.py:179(sqlrepr) > > It's interesting that sqlrepr is at the top. I'll have to think about > how I'm using it. It's also been suggested that SQLObject rely on the > database driver's quoting instead of doing its own. This may lend more > weight to that opinion. Using the db driver's quoting would also mean that the SQL can be cached as prepared statements by the DB. For some (e.g. Oracle) that can lead to a pretty significant speed up in itself. Thanks for your reply, Colin. |
From: Ian B. <ia...@co...> - 2004-10-08 17:41:18
|
Colin Stewart wrote: > Is it safe to use the DBAPI.getConnection() call to get the underlying > connection and use this mixed in with ordinary SQLObject calls? It > would be nice to be able to mix raw SQL and SQLObject as needed over the > same connection. I'm not sure how SQLObject's caching works, but > there's presumably a mechanism for invalidating results from individual > tables? If you look back in the archives a couple weeks, I outlined some ways to expire the cache manually. Generally if you are doing inserts it should be fine, as there's nothing related to inserts that gets cached. Joins and selects are not cached (except for .get), though they will returns the cached versions of objects. With an insert, this isn't an issue. Doing deletes or updates could cause cache consistency problems, so you'd have to expire the cache. >>/> Is it unrealistic to use SQLObject for DB interaction when handling >>> batch loads of data? I've done a quick profile of the code (top few >>> calls below) and nothing jumps out as being particularly easy to optimise... >>> >>> ncalls tottime percall cumtime percall filename:lineno(function) >>> 15308/14042 2.500 0.000 4.430 0.000 converters.py:179(sqlrepr) >> >>It's interesting that sqlrepr is at the top. I'll have to think about >>how I'm using it. It's also been suggested that SQLObject rely on the >>database driver's quoting instead of doing its own. This may lend more >>weight to that opinion./ >> > > Using the db driver's quoting would also mean that the SQL can be cached > as prepared statements by the DB. For some (e.g. Oracle) that can lead > to a pretty significant speed up in itself. I don't know enough about prepared statements to know if it would help. If it's possible to prepare, up front, three or four statements for every class, then it could help. If you have to prepare a statement, then use it several times, then prepare another statement, it's unlike to help, since SQLObject doesn't know enough to predict what statements are going to be used in sequence often enough to help. Also, the postgres and mysql drivers (and probably sqlite) don't have any prepared statements, and do all the quoting on the client side. I'm most interested in those backends, so there's not a huge amount to be gained. They probably do the quoting in C, though, with some performance gain there. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Brad B. <br...@bb...> - 2004-10-09 10:57:53
|
On Friday, October 8, 2004, at 06:40 PM, Ian Bicking wrote: > Colin Stewart wrote: [snip] >> Using the db driver's quoting would also mean that the SQL can be >> cached as prepared statements by the DB. For some (e.g. Oracle) that >> can lead to a pretty significant speed up in itself. > > I don't know enough about prepared statements to know if it would > help. If it's possible to prepare, up front, three or four statements > for every class, then it could help. If you have to prepare a > statement, then use it several times, then prepare another statement, > it's unlike to help, since SQLObject doesn't know enough to predict > what statements are going to be used in sequence often enough to help. > > Also, the postgres and mysql drivers (and probably sqlite) don't have > any prepared statements, and do all the quoting on the client side. > I'm most interested in those backends, so there's not a huge amount to > be gained. They probably do the quoting in C, though, with some > performance gain there. My interest is in PostgreSQL, so I'll comment on that. psycopg claims to support the DB-API v2, and thus should have prepared statements. So here's a unit test that demonstrates, perhaps, a decent way of hooking up bare SQL to a method (untested, I stayed up till 4am last night playing Mao, and I'm off to Heathrow in 10 minutes, so please forgive me if this code is a bit dirty. :) This may need to be molded around slightly to solve your problem but I hope it illustrates an interesting new idea and gets some thinking going on about this problem: class SQLMethodTest(SQLObjectTest): def testSQLMethod(self): class USAddressesResult(object): class USCitizen(object): def __init__(self, id, name, address): self.id = id self.name = name self.address = address def __init__(self, results): self.results = results def __iter__(self): for r in self.results: yield USCitizen(r[0], r[1], r[2]) class Person(SQLObject): usaddresses = SQLMethod( sql = """\ SELECT p.id, p.name, a.address FROM person p, address a WHERE p.id = a.personid AND a.country_code = 'US'""", result = USAddressesResult) usaddresses = classmethod(usaddresses) uscitizens = list(Person.usaddresses()) self.assertEquals(len(uscitizens), 2) self.assertEquals(uscitizens[0].name, "Ford Prefect") self.assertEquals(uscitizens[0].address, "Somewhere near Betelgeuse") The benefits gained from this feature would be: 1. The speed of working with raw SQL. 2. Being able to get a resultset that actually does what you need (since the current concept that "an element in a result set maps to one row in a table" often doesn't make sense, as each row returned from a SELECT query is often *not* a row in a table.) 3. You could, of course, use placeholders in the SQL statement and then the SQLMethod method would accept that many params. Thoughts, comments, feedback welcome. Gotta run, -- Brad Bollenbach |
From: Stuart B. <stu...@ca...> - 2004-10-11 12:49:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brad Bollenbach wrote: | My interest is in PostgreSQL, so I'll comment on that. psycopg claims to | support the DB-API v2, and thus should have prepared statements. psycopg 1.xx does not - it just pretends to as when it was written PostgreSQL didn't support them. The experimental psycopg 2 does support them I believe. - -- Stuart Bishop <stu...@ca...> http://www.canonical.com/ Canonical Ltd. http://www.ubuntulinux.com/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBaoFEAfqZj7rGN0oRAhRaAJ9hpdvCg8EWMs3efRQIk/T35U84OgCfW2Vd 9tHLQq5Yt21C9qOmaqF1Odo= =KOfm -----END PGP SIGNATURE----- |