From: Ben B. <be...@gr...> - 2005-05-03 23:49:35
|
I frequently find myself using SQLObject to pull results and show related objects obtained by doing a MultipleJoin, however the SQL behind this to my understanding is a bit database heavy. If I pull a list of 30 people, then iterate over them, each iteration gets turned into a SELECT, each call to their related table is another SELECT, etc. Is there anything currently in SQLObject, or in the works, that will pull and cache more of this data in advance, and hopefully turn it into one query? While I'm sure a lot of us have heard enough about Rails to make one sick, I did notice one of the "big" features they recently introduced was "eager associations" that seem to do exactly what I'm looking for in this case. Details on their way here: http://ar.rubyonrails.com/classes/ActiveRecord/Associations/ClassMethods.htm l (under Eager Associations) Can we do something like that with SQLObject to cache specified related objects in advance? This would certainly reduce my SQL queries significantly for some of my pages. Thanks, Ben |
From: Ben B. <be...@gr...> - 2005-05-04 23:00:32
|
I have two tables, that I want to pull data out of, based on a one-to-many relationship. This happens fairly frequently, sometimes with two tables. What I'd like is a way to fetch all the data AND the related data in one go. For example, I have a table of user data, and it relates to a table of addresses with each user having multiple addresses listed. I pull a group of people out of the user table, then iterate over it showing their addresses. As far as I can, this is very inefficient, as SQLObject is pulling a SELECT query for each person I list, then another SELECT query against their addresses when I list those. Is there no way to pull the whole thing in one SQL query and get SQLObject's that are already fetched including a specified related object? I noticed this is one of the major things the Rails folk touted with "Eager associations" in their latest release, and I'd love to be able to do something like this with SQLObject. I'm not sure what the syntax would look like to perform this operation with SQLObject, the rails ppl did it like so: http://ar.rubyonrails.com/classes/ActiveRecord/Associations/ClassMethods.htm l If you scroll down to their section on Eager Associations. Can SQLObject cache the related results right now using some sort of extension to list(User.select(...)) ? Thanks, Ben |
From: David M. C. <da...@da...> - 2005-05-04 02:14:17
|
On Tue, May 03, 2005 at 04:49:25PM -0700, Ben Bangert wrote: > Is there anything currently in SQLObject, or in the works, that will pull > and cache more of this data in advance, and hopefully turn it into one > query? While I'm sure a lot of us have heard enough about Rails to make one Didn't Michel or Oleg or somebody say they'd implemented OUTERJOIN in CVS? I tried LEFTJOINOn, but that's broken. The usual way is described in the FAQ: http://www.sqlobject.org/docs/FAQ.html#how-can-i-do-a-left-join Or you can do one SQL query and retrieve the objects by ID with .get() (not tested and not benchmarked against the other methods): result = Contact._connection.queryAll("""SELECT cs.id, cn.id FROM customer cs LEFT JOIN contact cn ON (cs.id=cn.customer_id) WHERE cs.first_name LIKE 'J%'""") customers = [] custContacts = {} for customer_id, contact_id in result: if customer_id not in custContacts: customer = Customer.get(customer_id) customers.append(customer) if contact_id is not None: contact = Contact.get(contact_id) custContacts.setdefault(customer_id, []).append(contact) else: custContacts[customer_id] = [] for customer in customers: print customer for contact in custContacts[customer.id]: print " ", contact Dave Cook |
From: David M. C. <da...@da...> - 2005-05-04 02:41:47
|
On Tue, May 03, 2005 at 07:14:05PM -0700, David M. Cook wrote: > Didn't Michel or Oleg or somebody say they'd implemented OUTERJOIN in CVS? > I tried LEFTJOINOn, but that's broken. Oops, it was Oleg, and I missed the fact that you have to use the join keyword s = Composer.select(join=sqlbuilder.LEFTJOINOn(Composer, Work, Composer.q.id==Work.q.composerID)) Dave Cook |
From: Ben B. <bba...@gr...> - 2005-05-04 02:57:07
|
On May 3, 2005, at 7:41 PM, David M. Cook wrote: > On Tue, May 03, 2005 at 07:14:05PM -0700, David M. Cook wrote: > > >> Didn't Michel or Oleg or somebody say they'd implemented OUTERJOIN >> in CVS? >> I tried LEFTJOINOn, but that's broken. >> > > Oops, it was Oleg, and I missed the fact that you have to use the join > keyword > > s = Composer.select(join=sqlbuilder.LEFTJOINOn(Composer, Work, > Composer.q.id==Work.q.composerID)) So in that case, I would have a normal list of Composer objects to iterate over, and when I call Composer.works, the Work objects were already fetched in one join? - Ben |
From: David M. C. <da...@da...> - 2005-05-04 03:39:05
|
On Tue, May 03, 2005 at 07:57:22PM -0700, Ben Bangert wrote: > So in that case, I would have a normal list of Composer objects to > iterate over, and when I call Composer.works, the Work objects were > already fetched in one join? Doesn't look like it does that: In [7]: s = Composer.select(join=sqlbuilder.LEFTJOINOn(Composer, Work, Composer.q.id==Work.q.composerID)) In [8]: s[0].works SELECT composer.id, composer.name FROM composer LEFT JOIN work ON (composer.id = work.composer_id) WHERE 1 = 1 LIMIT 1 SELECT id FROM work WHERE composer_id = 1 SELECT composer_id, title FROM work WHERE id = 1 Out[8]: [<Work 1 composerID=1 title=u'Sacred Music'>] As I said in a previous message in the LEFT JOIN thread, I'm not sure what the utility of this actually is. Dave Cook |
From: Ben B. <be...@gr...> - 2005-05-04 02:54:30
|
On May 3, 2005, at 7:14 PM, David M. Cook wrote: > Didn't Michel or Oleg or somebody say they'd implemented OUTERJOIN > in CVS? > I tried LEFTJOINOn, but that's broken. > > The usual way is described in the FAQ: > > http://www.sqlobject.org/docs/FAQ.html#how-can-i-do-a-left-join Yea.... that example leaves a lot to be desired of, especially when you have something compare it against (the Rails way). It looks very crude, and it'd only get worse if I wanted two relations returned instead of just one additional. > Or you can do one SQL query and retrieve the objects by ID with .get > () (not > tested and not benchmarked against the other methods): > > result = Contact._connection.queryAll("""SELECT cs.id, cn.id FROM > customer cs > LEFT JOIN contact cn > ON (cs.id=cn.customer_id) > WHERE cs.first_name LIKE 'J%'""") > customers = [] > custContacts = {} > for customer_id, contact_id in result: > if customer_id not in custContacts: > customer = Customer.get(customer_id) > customers.append(customer) > if contact_id is not None: > contact = Contact.get(contact_id) > custContacts.setdefault(customer_id, []).append(contact) > else: > custContacts[customer_id] = [] > for customer in customers: > print customer > for contact in custContacts[customer.id]: > print " ", contact Right, now if you look at that, and compare it with what I believe would be the Rails equivilant: result = Contact.find(:all, :include => :contact) Which one looks nicer? I'm hoping we can get closer to the Rails way, as other projects hoping to be Rails-like (Subway) rely on SQLObject to make up the database-object mapper section of the framework so this will help us all out (ok, I'll admit it, I obviously would like to avoid a few dozen lines of code to get my objects cached :) ). The FAQ docs say that optimizations are crude and "less than pretty", it doesn't need to be that way as we can see. I've tried to look into the source of SQLObject some, but I can't make heads or tails of the metaclasses so I'm hoping someone with some more experience in this would be interested in implementing it, if anyone else believes this would be as useful as I do. Thanks, Ben |
From: Ian B. <ia...@co...> - 2005-05-04 03:06:56
|
Ben Bangert wrote: > I'm hoping we can get closer to the Rails way, as other projects hoping > to be Rails-like (Subway) rely on SQLObject to make up the > database-object mapper section of the framework so this will help us > all out (ok, I'll admit it, I obviously would like to avoid a few dozen > lines of code to get my objects cached :) ). The FAQ docs say that > optimizations are crude and "less than pretty", it doesn't need to be > that way as we can see. Well, it means factoring out some of the SELECT stuff. Which might not be that hard; it's not like there's a lot of code points that are actually generating the stuff. In this case, there'd have to be some way to unpack multiple objects from a SELECT query, and a way to get that SELECT to include multiple tables. The obviousish way to do it would be by adding an option to .select()/SelectResult that would eagerly include tables (either the given tables, or all the tables in the join). Then you'd construct the other objects, but throw them away, relying on the cache. Which is a little iffy; presumably you really want to keep them around, but right now joins are only optimized through caching. That is, an object only knows the id of a joined table, but OtherClass.get(id) is fast when the object is in memory. But it can be garbage collected if it goes out of the cache. > I've tried to look into the source of SQLObject some, but I can't make > heads or tails of the metaclasses so I'm hoping someone with some more > experience in this would be interested in implementing it, if anyone > else believes this would be as useful as I do. Don't blame it on the metaclasses! There's only like one, and all it does is call __classinit__. And I hope that __classinit__ is fairly easy to understand -- it should be. Well, the contents might be a bit crufty, but the concept is simple enough. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Ben B. <be...@gr...> - 2005-05-04 03:38:49
|
On May 3, 2005, at 8:07 PM, Ian Bicking wrote: > Well, it means factoring out some of the SELECT stuff. Which might > not be that hard; it's not like there's a lot of code points that > are actually generating the stuff. In this case, there'd have to > be some way to unpack multiple objects from a SELECT query, and a > way to get that SELECT to include multiple tables. I'm guessing that it'd also have to tack the constructed object class into the list where its expected to be called from. > The obviousish way to do it would be by adding an option to .select > ()/SelectResult that would eagerly include tables (either the given > tables, or all the tables in the join). Then you'd construct the > other objects, but throw them away, relying on the cache. Which is > a little iffy; presumably you really want to keep them around, but > right now joins are only optimized through caching. That is, an > object only knows the id of a joined table, but OtherClass.get(id) > is fast when the object is in memory. But it can be garbage > collected if it goes out of the cache. I don't understand how the caching works with SQLObject at all. Is there any place that mentions how long objects are cached for, how that cache can be tweaked (if it can), and how I can manually clear the cache? Also, I'm assuming that the objects are cached just for that thread/process, as I run in Apache prefork this doesn't help me much, is there any way to get to the object cache's from another child process? > Don't blame it on the metaclasses! There's only like one, and all > it does is call __classinit__. And I hope that __classinit__ is > fairly easy to understand -- it should be. Well, the contents > might be a bit crufty, but the concept is simple enough. I'm working on it. :) It's that first glimpse that always puzzles me for a few minutes. Thanks, Ben |
From: Ian B. <ia...@co...> - 2005-05-04 04:04:22
|
Ben Bangert wrote: > On May 3, 2005, at 8:07 PM, Ian Bicking wrote: > >> Well, it means factoring out some of the SELECT stuff. Which might >> not be that hard; it's not like there's a lot of code points that are >> actually generating the stuff. In this case, there'd have to be some >> way to unpack multiple objects from a SELECT query, and a way to get >> that SELECT to include multiple tables. > > > I'm guessing that it'd also have to tack the constructed object class > into the list where its expected to be called from. No, you're just trying to avoid doing another database query. You could either stick that instance into the parent class (whatever you are actually selecting), but that doesn't work well with how foreign keys work right (where there are no actual references between instances, just ids that are used to fetch instances). >> The obviousish way to do it would be by adding an option to .select >> ()/SelectResult that would eagerly include tables (either the given >> tables, or all the tables in the join). Then you'd construct the >> other objects, but throw them away, relying on the cache. Which is a >> little iffy; presumably you really want to keep them around, but >> right now joins are only optimized through caching. That is, an >> object only knows the id of a joined table, but OtherClass.get(id) is >> fast when the object is in memory. But it can be garbage collected >> if it goes out of the cache. > > > I don't understand how the caching works with SQLObject at all. Is > there any place that mentions how long objects are cached for, how that > cache can be tweaked (if it can), and how I can manually clear the > cache? Also, I'm assuming that the objects are cached just for that > thread/process, as I run in Apache prefork this doesn't help me much, > is there any way to get to the object cache's from another child process? It's definitely something that needs to be much better documented than it currently is. A weak reference is always maintained to everything, so as long as an object is in memory it is available for reuse. In addition, real references are kept of most objects, with a culling process running every so often (every 100 fetches or something, I think) that moves some objects from the strong cache to the weak cache. You can "clear" the cache using an undocumented method that I now forget. That just means moving all the objects to the weak cache. You can also expire objects, which means that any attribute access will cause the object to be requeried. When you do a select and fetch a bunch of rows, if some of those rows describe objects already in memory, then those objects will be reused. There's a method (um, not documented and I can't remember it) that will expire all objects, which you might want to do between requests. You can't do that for a single thread unless you are using per-thread connections; if you are doing process-wide connections in a threaded environment, all threads will share the same set of objects. There's other aspects too. It's all the many details that has kept me from documenting it :-/ >> Don't blame it on the metaclasses! There's only like one, and all it >> does is call __classinit__. And I hope that __classinit__ is fairly >> easy to understand -- it should be. Well, the contents might be a >> bit crufty, but the concept is simple enough. > > > I'm working on it. :) It's that first glimpse that always puzzles me > for a few minutes. I won't claim the code is easy to read, but metaclasses aren't to blame either way ;) -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Ben B. <be...@gr...> - 2005-05-04 04:41:00
|
On May 3, 2005, at 9:04 PM, Ian Bicking wrote: >> On May 3, 2005, at 8:07 PM, Ian Bicking wrote: >> >>> Well, it means factoring out some of the SELECT stuff. Which >>> might not be that hard; it's not like there's a lot of code >>> points that are actually generating the stuff. In this case, >>> there'd have to be some way to unpack multiple objects from a >>> SELECT query, and a way to get that SELECT to include multiple >>> tables. Ok, trying to piece this together, the cache is used to speed up the joins. So if during the join, there was some indicator to fetch related objects right there, and the objects were cached, would they be used during the lookup on them? Would that solve the problem? ie, if I have a Person table and a Address table, and I want all the people and their addresses in one query, it'd be something like this: people = list(Person.select(include == Address)) # or however the syntax is worked to include the other table for person in people: print person.firstName + ' lives in these cities: ' + ','.join ([address.city for address in person.addresss]) In the person.addresss fetch, it'd see the existing objects in the cache and not query the database? >>> The obviousish way to do it would be by adding an option >>> to .select ()/SelectResult that would eagerly include tables >>> (either the given tables, or all the tables in the join). Then >>> you'd construct the other objects, but throw them away, relying >>> on the cache. Which is a little iffy; presumably you really >>> want to keep them around, but right now joins are only optimized >>> through caching. That is, an object only knows the id of a >>> joined table, but OtherClass.get(id) is fast when the object is >>> in memory. But it can be garbage collected if it goes out of >>> the cache. This is what I'm hoping my example usage would do, assuming I understand how the caching is working. I'm also assuming I need to do list(), otherwise as I iterate through it would run a separate SELECT query each time, right? > When you do a select and fetch a bunch of rows, if some of those > rows describe objects already in memory, then those objects will be > reused. There's a method (um, not documented and I can't remember > it) that will expire all objects, which you might want to do > between requests. You can't do that for a single thread unless you > are using per-thread connections; if you are doing process-wide > connections in a threaded environment, all threads will share the > same set of objects. Hmm, I'm running in Apache 2 with prefork, so each SQLObject cache is hanging out in its own process. So I won't get any benefit from the object caching beyond the current request as a different process is going to pick up the next request. I'm guessing this means I should search for that method to wipe out the object cache, so I can clear it at the beginning of the request from the last request (Plus I don't know if the db got updated in a different process). I don't suppose there's any way to stash the objects into a shared memory cache that another apache child process can get to? Thanks, Ben |
From: Ian B. <ia...@co...> - 2005-05-04 15:57:58
|
Ben Bangert wrote: > On May 3, 2005, at 9:04 PM, Ian Bicking wrote: > >>> On May 3, 2005, at 8:07 PM, Ian Bicking wrote: >>> >>>> Well, it means factoring out some of the SELECT stuff. Which might >>>> not be that hard; it's not like there's a lot of code points that >>>> are actually generating the stuff. In this case, there'd have to >>>> be some way to unpack multiple objects from a SELECT query, and a >>>> way to get that SELECT to include multiple tables. > > > Ok, trying to piece this together, the cache is used to speed up the > joins. So if during the join, there was some indicator to fetch related > objects right there, and the objects were cached, would they be used > during the lookup on them? Would that solve the problem? > > ie, if I have a Person table and a Address table, and I want all the > people and their addresses in one query, it'd be something like this: > > people = list(Person.select(include == Address)) # or however the > syntax is worked to include the other table > for person in people: > print person.firstName + ' lives in these cities: ' + > ','.join([address.city for address in person.addresss]) > > In the person.addresss fetch, it'd see the existing objects in the cache > and not query the database? No, a join like person.addresses always produces a query. address.person would not produce a query if the person was in memory. I've found it far too difficult to keep cache consistency of joins like this. So this will result in 1+(number of people selected) queries. >>>> The obviousish way to do it would be by adding an option to .select >>>> ()/SelectResult that would eagerly include tables (either the given >>>> tables, or all the tables in the join). Then you'd construct the >>>> other objects, but throw them away, relying on the cache. Which is >>>> a little iffy; presumably you really want to keep them around, but >>>> right now joins are only optimized through caching. That is, an >>>> object only knows the id of a joined table, but OtherClass.get(id) >>>> is fast when the object is in memory. But it can be garbage >>>> collected if it goes out of the cache. > > > This is what I'm hoping my example usage would do, assuming I understand > how the caching is working. I'm also assuming I need to do list(), > otherwise as I iterate through it would run a separate SELECT query each > time, right? Without list() the people are pulled in lazily from the connection, but there is still just one query. If you turn on debugging in your connection (?debug=t) then all the queries will be printed on stdout. >> When you do a select and fetch a bunch of rows, if some of those rows >> describe objects already in memory, then those objects will be reused. >> There's a method (um, not documented and I can't remember it) that >> will expire all objects, which you might want to do between requests. >> You can't do that for a single thread unless you are using per-thread >> connections; if you are doing process-wide connections in a threaded >> environment, all threads will share the same set of objects. > > > Hmm, I'm running in Apache 2 with prefork, so each SQLObject cache is > hanging out in its own process. So I won't get any benefit from the > object caching beyond the current request as a different process is > going to pick up the next request. I'm guessing this means I should > search for that method to wipe out the object cache, so I can clear it > at the beginning of the request from the last request (Plus I don't know > if the db got updated in a different process). > > I don't suppose there's any way to stash the objects into a shared > memory cache that another apache child process can get to? I do most of my programming in threads, so I don't really know. I'd have to ask you: is there a way to stash objects into a shared memory cache? -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Oleg B. <ph...@ma...> - 2005-05-04 16:05:37
|
On Wed, May 04, 2005 at 10:54:35AM -0500, Ian Bicking wrote: > I do most of my programming in threads, so I don't really know. I'd > have to ask you: is there a way to stash objects into a shared memory cache? http://www.danga.com/memcached/ http://www.danga.com/memcached/apis.bml Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Ben B. <be...@gr...> - 2005-05-04 18:32:40
|
On 5/4/05 8:54 AM, "Ian Bicking" <ia...@co...> wrote: >> for person in people: >> print person.firstName + ' lives in these cities: ' + >> ','.join([address.city for address in person.addresss]) > No, a join like person.addresses always produces a query. > address.person would not produce a query if the person was in memory. > I've found it far too difficult to keep cache consistency of joins like > this. So this will result in 1+(number of people selected) queries. So there's no way to have the lookup function the same, but have the objects pre-fetched? Is this a limitation of the implementation chosen? I'm just wondering what makes this so difficult, as ActiveRecord already has it and they've only had a few months yet they appear to have caught up and in this aspect passed SQLObject (which has been around almost 2 years?). I've gone through more of the code, and you are right, there's only a metaclass or two, so I'm not going to blame them. :) It is very difficult to go through the code and figure out what is happening though, this is most likely because I'm not familiar enough with the architecture of how SQLObject generates the objects, etc. Is there some brief write-up on the methodology used or architecture so that I can try and fit the code I see in with the architecture in advance, rather than trying to figure out what you were thinking when you wrote it by examining it? > I do most of my programming in threads, so I don't really know. I'd > have to ask you: is there a way to stash objects into a shared memory cache? I looked over memcached which Oleg also mentioned, the Python API is a bit out of date and known to have some issues from what I can see. Is there anyone using Python in a large production environment that has come up with some other way to share objects through a shared memory cache scheme? Thanks, Ben |
From: Oleg B. <ph...@ma...> - 2005-05-04 18:56:39
|
On Wed, May 04, 2005 at 11:32:33AM -0700, Ben Bangert wrote: > some other way to share objects through a shared memory cache scheme? http://poshmodule.sourceforge.net/ Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Ian B. <ia...@co...> - 2005-05-04 19:19:31
|
Ben Bangert wrote: >>>for person in people: >>> print person.firstName + ' lives in these cities: ' + >>>','.join([address.city for address in person.addresss]) >> >>No, a join like person.addresses always produces a query. >>address.person would not produce a query if the person was in memory. >>I've found it far too difficult to keep cache consistency of joins like >>this. So this will result in 1+(number of people selected) queries. > > > So there's no way to have the lookup function the same, but have the objects > pre-fetched? Is this a limitation of the implementation chosen? I'm just > wondering what makes this so difficult, as ActiveRecord already has it and > they've only had a few months yet they appear to have caught up and in this > aspect passed SQLObject (which has been around almost 2 years?). I don't think ActiveRecord does this either. You are asking for eager loading of a one-to-many join, which would mean you'd load all the people rows multiple times. And I guess that would be okay depending on your data (e.g., if most people had only one address), but it's a bit more aggressive than merely "eager" loading (at least how I view eager loading). As to ActiveRecord, well, beats me what they are doing. I have no idea what their caching looks like either, so it may be more important for performance in their case. It's not architecturally impossible in SQLObject, but I haven't needed it, and now you are doubling up requests, as this is not the same request as before. For instance, in SQLObject if you have long-lived caches (either in a threaded environment, or maybe some multiprocess environments depending how you set it up) relatively static tables will result in few queries. E.g, sometimes I put "status" into a table, so there's a foreign key that refers to the object's status, and that table has description and some rules and whatnot. This is efficient in SQLObject because that status table probably has ten fixed rows or so, and they get cached early and seldom need to be requeried. If ActiveRecord emitted a query everytime someone asked for obj.status.name, then I can see why they'd need eager selects. Of course, caching can be a curse as well, especially in multiprocess environments. I think it would be a much more important improvement for SQLObject to explicitly design a caching system where, for instance, you could mark classes "volatile" and they'd get purged more frequently; or other classes "stable" and they'd stay in memory for a long time; or you could cache objects between processes, or maybe only database row tuples would be cached between processes, or whatever. I haven't done this in part because I don't do much multiprocess programming, and so I don't have an intuitive feel for what the most important aspects are; but I'd certainly help someone who was interested in improving that. > I've gone through more of the code, and you are right, there's only a > metaclass or two, so I'm not going to blame them. :) > > It is very difficult to go through the code and figure out what is happening > though, this is most likely because I'm not familiar enough with the > architecture of how SQLObject generates the objects, etc. Is there some > brief write-up on the methodology used or architecture so that I can try and > fit the code I see in with the architecture in advance, rather than trying > to figure out what you were thinking when you wrote it by examining it? At one point I tried to comment most of the code, which describes things on a small scale, but a larger code map would be useful. I might be able to do that tomorrow. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Ben B. <be...@gr...> - 2005-05-04 19:51:39
|
On 5/4/05 12:16 PM, "Ian Bicking" <ia...@co...> wrote: > I don't think ActiveRecord does this either. You are asking for eager > loading of a one-to-many join, which would mean you'd load all the > people rows multiple times. And I guess that would be okay depending on > your data (e.g., if most people had only one address), but it's a bit > more aggressive than merely "eager" loading (at least how I view eager > loading). From the example they list, it does appear to eager load one-to-many join rows, and even multiple one-to-many join rows if desired. An example they show: # Turning N+1 queries into 1 for post in Post.find(:all, :include => [ :author, :comments ]) puts "Post: " + post.title puts "Written by: " + post.author.name puts "Last comment on: " + post.comments.first.created_on end While in SQLObject calling post.comments implicitly causes a join, using eager loading they've avoided that by specifying to load that in advance. That's all I'd like to be able to replicate in SQLObject, without the excess syntax for optimization in the FAQ on left-joins. > As to ActiveRecord, well, beats me what they are doing. I have no idea > what their caching looks like either, so it may be more important for > performance in their case. It's not architecturally impossible in > SQLObject, but I haven't needed it, and now you are doubling up > requests, as this is not the same request as before. My original request was to reduce my SQL queries on these fairly common requests where I pull a table and related one-to-many join tables. Part of the problem in my request, was that I didn't fully understand how SQLObject caches results to reduce future lookups. I still won't claim to fully understand this caching process, but until I can get it using a shared cache between processes its use is limited to each request. Maybe what I'm really requesting is more like David Cook's request to have a tuple returned when a join or left join returns multiple related objects as that would appear to have fetched all the objects in one query. I'm not sure I'd want it in a tuple, but if that's a quicker intermediate solution than somehow getting the post.comments join to check the cache first, I'd be happy to have that for now. > For instance, in SQLObject if you have long-lived caches (either in a > threaded environment, or maybe some multiprocess environments depending > how you set it up) relatively static tables will result in few queries. > E.g, sometimes I put "status" into a table, so there's a foreign key > that refers to the object's status, and that table has description and > some rules and whatnot. This is efficient in SQLObject because that > status table probably has ten fixed rows or so, and they get cached > early and seldom need to be requeried. If ActiveRecord emitted a query > everytime someone asked for obj.status.name, then I can see why they'd > need eager selects. Hmm, I think seeing more examples would help me out on some of these optimization aspects as well. I've looked through the examples in the distribution and found them fairly useful, is there any plans to put examples on the website or maybe have some way for other users to contribute their examples? (Sorry this is off topic) > > Of course, caching can be a curse as well, especially in multiprocess > environments. I think it would be a much more important improvement for > SQLObject to explicitly design a caching system where, for instance, you > could mark classes "volatile" and they'd get purged more frequently; or > other classes "stable" and they'd stay in memory for a long time; or you > could cache objects between processes, or maybe only database row tuples > would be cached between processes, or whatever. I haven't done this in > part because I don't do much multiprocess programming, and so I don't > have an intuitive feel for what the most important aspects are; but I'd > certainly help someone who was interested in improving that. I'll look into some of the caching stuff Oleg helpfully pointed out, if I wanted to have SQLObject cache to this alternate shared object caching system, can you give me a point on where in the source I should be looking? Thanks, Ben |
From: David M. C. <da...@da...> - 2005-05-05 04:41:05
|
On Wed, May 04, 2005 at 12:51:23PM -0700, Ben Bangert wrote: > Maybe what I'm really requesting is more like David Cook's request to have a > tuple returned when a join or left join returns multiple related objects as > that would appear to have fetched all the objects in one query. I'm not sure > I'd want it in a tuple, but if that's a quicker intermediate solution than Would be easy enough to check for the 'include' keyword so that the associated objects would be retrieved but not returned. Dave Cook |
From: David M. C. <da...@da...> - 2005-05-04 04:35:37
|
On Tue, May 03, 2005 at 07:54:45PM -0700, Ben Bangert wrote: > result = Contact.find(:all, :include => :contact) Oh, OK, I had it backwards. It's nothing to do with an outer join, it's really just an inner join with retrieval of all columns for both tables: select contact.*, customer.* from contact join customer on (contact.customer_id=customer.id) Then just add the Customer objects to the cache as you do the Contact objects (I have no idea how hard it would actually be to modify sqlobject to do). However, you can do it with 2 selects: list(Customer.select()) Contact.select(blah) From 1+N down to 2 is still pretty good. I think that first query can also be narrowed down do you don't waste time fetching and instantiating Customers without Contacts. Dave Cook |
From: David M. C. <da...@da...> - 2005-05-04 05:02:41
|
On Tue, May 03, 2005 at 09:35:21PM -0700, David M. Cook wrote: > list(Customer.select()) > Contact.select(blah) > > >From 1+N down to 2 is still pretty good. I think that first query can also > be narrowed down do you don't waste time fetching and instantiating > Customers without Contacts. To followup, this would give you the minimum number of instantiations: s1 = Customer.select(IN(Composer.q.id, sqlbuilder.Select(Contact.q.customerID)) l = list(s) s2 = Contact.select(...) Whether this makes much of a difference or not depends on what fraction of customers do not have contacts. You can also do it with EXISTS and an outer join, but in my experience using IN with a subquery is faster. Dave Cook |
From: Ben B. <bba...@gr...> - 2005-05-04 05:45:20
|
On May 3, 2005, at 10:02 PM, David M. Cook wrote: >>> From 1+N down to 2 is still pretty good. I think that first >>> query can also >>> >> be narrowed down do you don't waste time fetching and instantiating >> Customers without Contacts. >> > > To followup, this would give you the minimum number of instantiations: > > s1 = Customer.select(IN(Composer.q.id, sqlbuilder.Select > (Contact.q.customerID)) > l = list(s) > s2 = Contact.select(...) > > Whether this makes much of a difference or not depends on what > fraction of > customers do not have contacts. > > You can also do it with EXISTS and an outer join, but in my > experience using > IN with a subquery is faster. It would actually be 1+N down to X, where X is the amount of additional related tables to pull. So I'm thinking extending the join in some way to cache objects from related tables would be the best solution, as from what I've heard now, that should do the trick when a reference is made to the related objects. - Ben |
From: David M. C. <da...@da...> - 2005-05-04 09:15:12
|
On Tue, May 03, 2005 at 10:45:33PM -0700, Ben Bangert wrote: > It would actually be 1+N down to X, where X is the amount of > additional related tables to pull. So I'm thinking extending the join > in some way to cache objects from related tables would be the best > solution, as from what I've heard now, that should do the trick when > a reference is made to the related objects. I think ultimately the "right" way to do it is to refactor to allow for tuples in the searchresult so you can do something like Contact.select(Contact.q.customerID==Customer.q.id, otherClasses=[Customer]) and get back results like contact0, customer0 contact1, customer0 ... Then doing an eager fetch is just a special case of this. Looks like the main stuff is going on in Iteration and queryForSelect. I think doing the eager fetch with 1 query would involve rewriting most of those anyway. Dave Cook |
From: Oleg B. <ph...@ma...> - 2005-05-04 09:08:08
|
On Tue, May 03, 2005 at 04:49:25PM -0700, Ben Bangert wrote: > I frequently find myself using SQLObject to pull results and show related > objects obtained by doing a MultipleJoin, however the SQL behind this to my > understanding is a bit database heavy. If I pull a list of 30 people, then > iterate over them, each iteration gets turned into a SELECT, each call to > their related table is another SELECT, etc. > > Is there anything currently in SQLObject, or in the works, that will pull > and cache more of this data in advance, and hopefully turn it into one > query? I did that optimization for inheritance - see sqlobject/inheritance/iteration.py - the method fetchChildren() prefetches data for a number of rows at once. IWBN if someone port the idea to the main Iteration class. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |