Thread: [SQLObject] INSERT rows with foreign keys while avoiding extra lookup?
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: kgi <ia...@gm...> - 2005-08-08 14:52:02
|
I'm trying to work out if something is possible with SQLObject; I'm not sure if it's right in front of my nose and I can't see it, or whether it's just not possible and I'm being unreasonable :-) I've got a lot of tables that simple name<->id lookups, and a few other tables that are just a bunch of foreign keys. My problem is that I don't know how to insert a new row in the "bunch of foreign keys" table if I don't have the original objects to hand without re-looking up the name<->id mapping, taking N SQL queries for each insert. A bit of background: this code is in a server that will be receiving commands over TCP, and it creates objects in response to these commands. Many of the name<->id mappings have lots of rows (where "lots" is in the region 10^5 or 10^6, so I don't want to get into caching, since access is inconsistent: a cached value might not be used again for a very long time. A simplified example is shown below. The "ExampleSchema" and "ExampleField" tables are simple name<->id maps. A given schema may contain, say, a few tens of the many possible fields (and there might be thousands of schemata). If it's not possible, I'll have to write some sort of wrapper to perform the insert manually, possibly coupled with a cache to handle cache hits if any. Regards, Ricky -- [start example code] from sqlobject import * # Assume we have a defined connection URI here ... from connection import conn class ExampleSchema ( SQLObject ): _connection = conn schemaName = StringCol ( length = 32, notNone = True, unique = True, alternateID = True ) class ExampleField ( SQLObject ): _connection = conn fieldName = StringCol ( length = 32, notNone = True, unique = True, alternateID = True ) class ExampleSchemaField ( SQLObject ): _connection = conn schema = ForeignKey ( "ExampleSchema" ) field = ForeignKey ( "ExampleField" ) sfIndex = DatabaseIndex ( 'schema', 'field', unique = True ) if __name__ == "__main__": # Drop tables ExampleSchemaField.dropTable ( ifExists = True ) ExampleField.dropTable ( ifExists = True ) ExampleSchema.dropTable ( ifExists = True ) # Recreate tables ExampleSchema.createTable ( ifNotExists = True ) ExampleField.createTable ( ifNotExists = True ) ExampleSchemaField.createTable ( ifNotExists = True ) ExampleSchema ( schemaName = "myschema" ) ExampleField ( fieldName = "myfield" ) # Now let's assume that I do *not* have a reference to the # ExampleSchema and ExampleField objects created above. # Instead, I just have the string values, "myschema" and # "myfield". # # I would like to be able to do this (or a variant), and have the # fact that I'm using something which is a foreign key be recognised # and dereferenced automatically. # # ExampleSchemaField ( schema = "myschema", field = "myfield" ) # # But instead, I have to do something like: myschema = ExampleSchema.selectBy ( schemaName = "myschema" ) [ 0 ] myfield = ExampleField.selectBy ( fieldName = "myfield" ) [ 0 ] # This also works, of course: myschema2 = ExampleSchema.bySchemaName ( "myschema" ) myfield2 = ExampleField.byFieldName ( "myfield" ) ExampleSchemaField ( schema = myschema, field = myfield ) # However, presumably this is inefficient, since it performs two # lookups instead of a single insert with two joins. |
From: Ian B. <ia...@co...> - 2005-08-08 15:18:28
|
kgi wrote: > I'm trying to work out if something is possible with SQLObject; I'm not sure > if it's right in front of my nose and I can't see it, or whether it's just > not possible and I'm being unreasonable :-) > > I've got a lot of tables that simple name<->id lookups, and a few other tables > that are just a bunch of foreign keys. > > My problem is that I don't know how to insert a new row in the "bunch of > foreign keys" table if I don't have the original objects to hand without > re-looking up the name<->id mapping, taking N SQL queries for each insert. Just look them up; the lookups are fairly heavily cached, so basically the values will be loaded lazily. The values are eventually expired as well (basically random objects are removed from the cache and put in the weak cache; if you are still using them there's a good chance they'll get moved back to the other cache before they actually get garbage collected). I think the result should be fairly decent performance for this case -- in the worse case it's N queries, but if you have actual load then the cache should be full and you'll just do one query. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: kgi <ia...@gm...> - 2005-08-16 10:58:41
|
On Monday 08 August 2005 18:17, Ian Bicking wrote: > kgi wrote: > > I'm trying to work out if something is possible with SQLObject; I'm not > > sure if it's right in front of my nose and I can't see it, or whether > > it's just not possible and I'm being unreasonable :-) > > > > I've got a lot of tables that simple name<->id lookups, and a few other > > tables that are just a bunch of foreign keys. > > > > My problem is that I don't know how to insert a new row in the "bunch of > > foreign keys" table if I don't have the original objects to hand without > > re-looking up the name<->id mapping, taking N SQL queries for each > > insert. > > Just look them up; the lookups are fairly heavily cached, so basically > the values will be loaded lazily. The values are eventually expired as > well (basically random objects are removed from the cache and put in the > weak cache; if you are still using them there's a good chance they'll > get moved back to the other cache before they actually get garbage > collected). I think the result should be fairly decent performance for > this case -- in the worse case it's N queries, but if you have actual > load then the cache should be full and you'll just do one query. Thanks for your reply, Ian. That's interesting to know (and the whole random expiry thing is quite clever the more I think about it, although it seems alarming and counterintuitive at the beginning). Is there a stable way of determining whether something is in the cache (that is, something in the API rather than something I can get by doing lots of dir()s on random hidden attributes). The reason I want this is that I'm mixing SQLObject code with Twisted code. I know that Twisted offers its own DB and object mapping layer that works in The Twisted Way, but I preferred SQLOBject, so I'm willing to go the extra mile to be able to mix the two. If a read request comes in and it's in the cache, then I can probably do everything immediately and not bother deferring to a separate thread. If something's not in the cache then I'm going to have to defer to one or more separate threads of execution. The problem, of course, is knowing whether something is in the cache or not; by the time I've tried to do a lookup, I've already committed to a DB read if it's not in the cache, and that might affect my scalability since it will briefly block the Twisted thread. Cheers, Ricky |
From: Ian B. <ia...@co...> - 2005-08-16 15:41:52
|
kgi wrote: > That's interesting to know (and the whole random expiry thing is quite clever > the more I think about it, although it seems alarming and counterintuitive at > the beginning). > > Is there a stable way of determining whether something is in the cache (that > is, something in the API rather than something I can get by doing lots of > dir()s on random hidden attributes). Unfortunately there isn't a stable API for this, though there should be. > The reason I want this is that I'm mixing SQLObject code with Twisted code. I > know that Twisted offers its own DB and object mapping layer that works in > The Twisted Way, but I preferred SQLOBject, so I'm willing to go the extra > mile to be able to mix the two. > > If a read request comes in and it's in the cache, then I can probably do > everything immediately and not bother deferring to a separate thread. If > something's not in the cache then I'm going to have to defer to one or more > separate threads of execution. > > The problem, of course, is knowing whether something is in the cache or not; > by the time I've tried to do a lookup, I've already committed to a DB read if > it's not in the cache, and that might affect my scalability since it will > briefly block the Twisted thread. I think it would make sense to add async methods to a base class that all your SQLObject instances use. Other people might find this useful too if they are in that environment. Initially those new methods would always do everything in threads, but later you could shortcircuit cases where you knew the operation wouldn't block (like getting a cached item). Unfortunately, though, every attribute access can block. But in practice (if you aren't using expire()) it's only the foreign keys that are likely to block (but they are quite likely to block). -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |