Thread: [SQLObject] Indexed access to a SelectResults
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Matt G. <ma...@po...> - 2003-07-16 13:28:56
|
If you have code like this all = Entity.select('all') for i in range(len(all)): print all[i] you get a whole stream of SQL calls. For a table with n rows you actually get 2n+1 SQL calls: 1 initial COUNT(*) and then a COUNT(*) and SELECT for each row. It makes sense for a SelectResults objects to cache the COUNT result on the first len() so that future len() calls don't go back to the database? Another relatively easy way of improving performance in the above scenario would be if SelectResults read and cache blocks of objects, 10 at a time for example, even if only one of those objects was actually returned to the caller. Anyway, just some thoughts to help SQLObject "scale up" a bit. I have no idea whether these ideas would break other parts of SQLObject ;-). Cheers, Matt -- Matt Goodall, Pollenation Internet Ltd w: http://www.pollenationinternet.com e: ma...@po... |
From: Ian B. <ia...@co...> - 2003-07-16 19:21:27
|
On Wed, 2003-07-16 at 08:37, Matt Goodall wrote: > If you have code like this > > all = Entity.select('all') > for i in range(len(all)): > print all[i] > > you get a whole stream of SQL calls. For a table with n rows you > actually get 2n+1 SQL calls: 1 initial COUNT(*) and then a COUNT(*) and > SELECT for each row. > > It makes sense for a SelectResults objects to cache the COUNT result on > the first len() so that future len() calls don't go back to the database? Huh... I don't know why it's creating lots of COUNT calls. Is it really? It would, though, create a bunch of SELECTs, like SELECT blah LIMIT 1 OFFSET 0, SELECT blah LIMIT 1 OFFSET 1, etc. The problem is the way you're using it. You should iterate directly over the result, or force getting everything at once by doing list(all). Like either: all = list(Entity.select()) for i in range(len(all)): print all[i] # or: all = Entity.select() for i, obj in enumerate(all): print obj Of course, the second is better as you won't pull all the objects into memory at once. enumerate isn't defined in Python 2.2, so you can use: from __future__ import generators def enumerate(iterable): i = 0 for obj in iterable: yield i, obj i += 1 |
From: Matt G. <ma...@po...> - 2003-07-16 22:42:33
|
On Wed, 2003-07-16 at 20:22, Ian Bicking wrote: > On Wed, 2003-07-16 at 08:37, Matt Goodall wrote: > > If you have code like this > > > > all = Entity.select('all') > > for i in range(len(all)): > > print all[i] > > > > you get a whole stream of SQL calls. For a table with n rows you > > actually get 2n+1 SQL calls: 1 initial COUNT(*) and then a COUNT(*) and > > SELECT for each row. > > > > It makes sense for a SelectResults objects to cache the COUNT result on > > the first len() so that future len() calls don't go back to the database? > > Huh... I don't know why it's creating lots of COUNT calls. Is it > really? Yes. It seems to be the last line of SelectResults.__getitem__ that is causing this, not sure why at the moment otherwise I'd send a patch. > It would, though, create a bunch of SELECTs, like SELECT blah > LIMIT 1 OFFSET 0, SELECT blah LIMIT 1 OFFSET 1, etc. > > The problem is the way you're using it. Unfortunately, it's not my code - it's another package that is walking the list using an index. I can probably change that package but couldn't/shouldn't SQLObject be a bit more graceful under these circumstances? To be honest, it's really the multiple COUNT calls that bother me as it seems unnecessary. The multiple SELECT calls to retrieve the objects are to be expected but can possibly be avoided. > You should iterate directly > over the result, or force getting everything at once by doing list(all). > Like either: > > all = list(Entity.select()) > for i in range(len(all)): > print all[i] This is effectively how I am working around the "problem" for now, by passing a list() rather than a SelectResults. > > # or: > all = Entity.select() > for i, obj in enumerate(all): > print obj I'm not sure I see the benefit of this. Doesn't SQLObject already use a generator to walk the result of the query? Wouldn't this just wrap one generator in another? Thanks, Matt -- Matt Goodall, Pollenation Internet Ltd w: http://www.pollenation.net e: ma...@po... |
From: Ian B. <ia...@co...> - 2003-07-17 00:05:16
|
On Wed, 2003-07-16 at 17:42, Matt Goodall wrote: > > It would, though, create a bunch of SELECTs, like SELECT blah > > LIMIT 1 OFFSET 0, SELECT blah LIMIT 1 OFFSET 1, etc. > > > > The problem is the way you're using it. > > Unfortunately, it's not my code - it's another package that is walking > the list using an index. I can probably change that package but > couldn't/shouldn't SQLObject be a bit more graceful under these > circumstances? I don't know. If the code is going to treat select results as a list, a list seems like a reasonable thing to use (i.e., list(Whatever.select())). You could, generically, have a proxy class that batches the fetches, and caches the results temporarily (while it's waiting for your code to do another loop). There's no reason it has to have specific support for SQLObject either. Maybe such a thing already exists (probably does in *someone's* code... maybe on the ASPN recipe site). > To be honest, it's really the multiple COUNT calls that bother me as it > seems unnecessary. The multiple SELECT calls to retrieve the objects are > to be expected but can possibly be avoided. Yeah. That is weird. Ah... after some effort, I'm afraid I found why it happens. list() tries to get the length of the things it's converting. That's a pain in the but. Maybe I'll post to comp.lang.python about how to get around this. > > all = list(Entity.select()) > > for i in range(len(all)): > > print all[i] > > This is effectively how I am working around the "problem" for now, by > passing a list() rather than a SelectResults. > > > > > # or: > > all = Entity.select() > > for i, obj in enumerate(all): > > print obj > > I'm not sure I see the benefit of this. Doesn't SQLObject already use a > generator to walk the result of the query? Wouldn't this just wrap one > generator in another? That's how enumerate is supposed to work. It's for when the underlying code really wants to get an index along with the object. If that code you're using *doesn't* use that index, then that's just bad programming, and it should iterate directly. Ian |
From: Matt G. <ma...@po...> - 2003-07-17 00:45:03
|
On Thu, 2003-07-17 at 01:05, Ian Bicking wrote: > On Wed, 2003-07-16 at 17:42, Matt Goodall wrote: > > > It would, though, create a bunch of SELECTs, like SELECT blah > > > LIMIT 1 OFFSET 0, SELECT blah LIMIT 1 OFFSET 1, etc. > > > > > > The problem is the way you're using it. > > > > Unfortunately, it's not my code - it's another package that is walking > > the list using an index. I can probably change that package but > > couldn't/shouldn't SQLObject be a bit more graceful under these > > circumstances? > > I don't know. If the code is going to treat select results as a list, a > list seems like a reasonable thing to use (i.e., > list(Whatever.select())). Perhaps you're right. The best solution is probably to change the code I'm using to iterate over the results in the normal manner. This code in question could actually take a slice first which would work really nicely with SQLObject. Something for me to look into. > > You could, generically, have a proxy class that batches the fetches, and > caches the results temporarily (while it's waiting for your code to do > another loop). There's no reason it has to have specific support for > SQLObject either. Maybe such a thing already exists (probably does in > *someone's* code... maybe on the ASPN recipe site). Now that's not a bad idea, thanks. > > > To be honest, it's really the multiple COUNT calls that bother me as it > > seems unnecessary. The multiple SELECT calls to retrieve the objects are > > to be expected but can possibly be avoided. > > Yeah. That is weird. Ah... after some effort, I'm afraid I found why > it happens. list() tries to get the length of the things it's > converting. Are you sure it's that simple? I was just playing with exactly this idea too and wrote the following code to experiment: class Test: def __init__(self): self.counter = 0 def __iter__(self): return self def next(self): self.counter = self.counter + 1 if self.counter > 5: raise StopIteration() return self.counter t = Test() a = list(t) print a # ok up to here print len(t) # this fails The fact that creating the list did not fail implies that it is not trying to get the length of t. I dunno though, it's very late here so I may be missing something obvious. - Matt -- Matt Goodall, Pollenation Internet Ltd w: http://www.pollenation.net e: ma...@po... |
From: Ian B. <ia...@co...> - 2003-07-17 00:51:45
|
On Wed, 2003-07-16 at 19:44, Matt Goodall wrote: > The fact that creating the list did not fail implies that it is not > trying to get the length of t. I dunno though, it's very late here so I > may be missing something obvious. list() gets the length if it can, but it doesn't cause a problem if it can't (it catches all exceptions). It took me a while to figure out what was happening, since I made the COUNT(*) raise an exception to see who was calling it, and the exception just disappeared. Ian |