Thread: [SQLObject] SelectResults internal
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Didrik P. <dp...@it...> - 2006-08-01 05:48:23
|
Hi, I'm trying to improve the performance of some modules in an application. I'm facing the following problem with this snippet of code : # get the validdata row vdata =3D validdata.select(AND(validdata.q.stationID =3D=3D station.id, validdata.q.mtime =3D=3D row.mtime))[0] # compare and update states for state in states: # check state current_state =3D getattr(vdata, state) ... The tables are the following : class validdata(SQLObject): mtime =3D DateTimeCol(notNone=3DTrue) station =3D ForeignKey('stations',notNone=3DTrue)=09 tha =3D DecimalCol( size=3D3,precision=3D1,default=3DNone) tha_state =3D ForeignKey('validstates', notNone=3DTrue, default=3D1) tsa =3D DecimalCol( size=3D3,precision=3D1,default=3DNone) tsa_state =3D ForeignKey('validstates', notNone=3DTrue, default=3D1) ... class validstates(SQLObject): name =3D StringCol(alternateID=3DTrue, notNone=3DTrue, unique=3D= True) description =3D StringCol() When parsing the validdata table to check for the states of the different data attributes, I'm selecting the data for a specific stationID and mtime with : [1] vdata =3D validdata.select(AND(validdata.q.stationID =3D=3D station.id, validdata.q.mtime =3D=3D row.mtime))[0] then I parse the states in this row with the following code : [2] for state in states: getattr(vdata, state) My understanding is that vdata is a SelectResults and that the getatrr(vdata, state) will just fetch the data from this SelectResults. It appears that I'm wrong.=20 It seems that for each getattr(vdata, state) there is a new call to the database. Here is the SQL output for expression [1] : 1/QueryR : SELECT validdata.id, validdata.mtime, validdata.station_id, validdata.tsa, validdata.tsastate_id, validdata.tha, validdata.thastate_id, validdata.hra, validdata.hrastate_id, validdata.tsf, validdata.tsfstate_id, validdata.tss, validdata.tssstate_id, validdata.ens, validdata.ensstate_id, validdata.dvt, validdata.dvtstate_id, validdata.vvt, validdata.vvtstate_id, validdata.plu, validdata.plustate_id FROM validdata WHERE ((validdata.station_id =3D 1) AND (validdata.mtime =3D '2006-01-04 05:00:00')) LIMIT 1 Then, a query for each getattr : 1/QueryOne: SELECT thastate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT thastate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT tsastate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT tsastate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT tssstate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT tssstate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT tsfstate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT tsfstate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT ensstate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT ensstate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT dvtstate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT dvtstate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT vvtstate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT vvtstate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT plustate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT plustate_id FROM validdata WHERE id =3D 78 1/COMMIT : auto 1/QueryOne: SELECT hrastate_id FROM validdata WHERE id =3D 78 1/QueryR : SELECT hrastate_id FROM validdata WHERE id =3D 78 What is the correct way of doing this in order to minimize the database queries ? Didrik |
From: Oleg B. <ph...@ma...> - 2006-08-01 08:32:49
|
On Tue, Aug 01, 2006 at 07:47:48AM +0200, Didrik Pinte wrote: > [1] vdata = validdata.select(AND(validdata.q.stationID == station.id, > validdata.q.mtime == row.mtime))[0] > > then I parse the states in this row with the following code : > > [2] for state in states: > getattr(vdata, state) > > My understanding is that vdata is a SelectResults and that the Wrong. You have used [0] to index the SelectResults object and hence get an SQLObject instance. > getatrr(vdata, state) will just fetch the data from this SelectResults. SelectResults doesn't hold any data itself - it is only a way to get a list of SQLObject instances. You've got an SQLObject instance already, and you fetches data from it. > It seems that for each getattr(vdata, state) there is a new call to the > database. Because you are fetching data via ForeignKey columns, and SQLObject doesn't cache them (but it caches the referenced rows). Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Didrik P. <dp...@it...> - 2006-08-01 09:27:25
|
Hi Oleg, Thank you for the clear explanation. I've added some comments and questions below : Le mardi 01 ao=FBt 2006 =E0 12:32 +0400, Oleg Broytmann a =E9crit : > On Tue, Aug 01, 2006 at 07:47:48AM +0200, Didrik Pinte wrote: > > [1] vdata =3D validdata.select(AND(validdata.q.stationID =3D=3D station= .id, > > validdata.q.mtime =3D=3D row.mtime))[0] > >=20 > > then I parse the states in this row with the following code : > >=20 > > [2] for state in states: > > getattr(vdata, state) > >=20 > > My understanding is that vdata is a SelectResults and that the >=20 > Wrong. You have used [0] to index the SelectResults object and hence g= et > an SQLObject instance. You're right. I understood it after having send the message to the list... >=20 > > It seems that for each getattr(vdata, state) there is a new call to the > > database. >=20 > Because you are fetching data via ForeignKey columns, and SQLObject > doesn't cache them (but it caches the referenced rows). Ok, so it means that the following code will always require two queries in the same table in order to get the foreign key value ? # select the row vdata =3D validata.select(AND(validdata.q.stationID =3D=3D station.id, validdata.q.mtime =3D=3D row.mtime))[0] # get the tha state id vdata.tha_state_id That's why it's so slow. I've 4.10^6 rows with 8 states per row. That means 7* 4.10^6 unwanted queries. Is there a more efficent way to do that using direct sqlobject calls ? Otherwise, I will use sqlbuilder and fetch the 8 states in one time. -- Didrik |
From: Oleg B. <ph...@ma...> - 2006-08-01 09:56:57
|
On Tue, Aug 01, 2006 at 11:26:48AM +0200, Didrik Pinte wrote: > Ok, so it means that the following code will always require two queries > in the same table in order to get the foreign key value ? > > # select the row > vdata = validata.select(AND(validdata.q.stationID == station.id, > validdata.q.mtime == row.mtime))[0] > # get the tha state id > vdata.tha_state_id I think so. > That's why it's so slow. I've 4.10^6 rows with 8 states per row. That > means 7* 4.10^6 unwanted queries. You certainly don't want to process them row by row. It would be too slow in any case. > Otherwise, I will use sqlbuilder and fetch the 8 states in one time. This is probably the best approach. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Didrik P. <dp...@it...> - 2006-08-01 10:08:40
|
Le mardi 01 ao=FBt 2006 =E0 13:56 +0400, Oleg Broytmann a =E9crit : >=20 > > That's why it's so slow. I've 4.10^6 rows with 8 states per row. That > > means 7* 4.10^6 unwanted queries. >=20 > You certainly don't want to process them row by row. It would be too > slow in any case. Unluckily, I have to ;-) Thanks for the help. Didrik |
From: Luke O. <lu...@me...> - 2006-08-01 14:50:13
|
> Then, a query for each getattr : > 1/QueryOne: SELECT thastate_id FROM validdata WHERE id = 78 > 1/QueryR : SELECT thastate_id FROM validdata WHERE id = 78 > 1/COMMIT : auto > 1/QueryOne: SELECT tsastate_id FROM validdata WHERE id = 78 > 1/QueryR : SELECT tsastate_id FROM validdata WHERE id = 78 > 1/COMMIT : auto This isn't what I would expect, using cacheValues = True (default in sqlmeta). And if there are really just 8 possible states (or some other small number of shared states, not clear), those should be fairly quickly cached as instances so that validdata records that refer to an already loaded state row shouldn't be executing any queries. I would expect one query for the select, which (using cacheValues) should populate the instances with the foreign key IDs (so vdata.thastate_id shouldn't be another query), and then initially a query for each not-yet-cached validstate instance. I've verified this in my own use of SQLObject (using svn SQLObject and connection.cache=True, sqlmeta.cacheValues=True). - Luke |
From: Didrik P. <dp...@it...> - 2006-08-01 16:18:08
|
Le mardi 01 ao=FBt 2006 =E0 09:50 -0500, Luke Opperman a =E9crit : > > Then, a query for each getattr : > > 1/QueryOne: SELECT thastate_id FROM validdata WHERE id =3D 78 > > 1/QueryR : SELECT thastate_id FROM validdata WHERE id =3D 78 > > 1/COMMIT : auto > > 1/QueryOne: SELECT tsastate_id FROM validdata WHERE id =3D 78 > > 1/QueryR : SELECT tsastate_id FROM validdata WHERE id =3D 78 > > 1/COMMIT : auto >=20 > This isn't what I would expect, using cacheValues =3D True (default in sq= lmeta). > And if there are really just 8 possible states (or some other small numbe= r of > shared states, not clear), those should be fairly quickly cached as insta= nces > so that validdata records that refer to an already loaded state row shoul= dn't > be executing any queries. >=20 > I would expect one query for the select, which (using cacheValues) should > populate the instances with the foreign key IDs (so vdata.thastate_id > shouldn't be another query), and then initially a query for each > not-yet-cached validstate instance. I've verified this in my own use of > SQLObject (using svn SQLObject and connection.cache=3DTrue, > sqlmeta.cacheValues=3DTrue). >=20 > - Luke Hi Luke, =46rom what i've heard/read, because I need transaction support on those tables, I can't use an sqlmeta.cacheValues=3DTrue. Question for the list : is it (still) true ? I'm using the latest debian package for etch (0.7.0-4). Didrik |
From: Oleg B. <ph...@ma...> - 2006-08-01 16:30:54
|
On Tue, Aug 01, 2006 at 06:17:31PM +0200, Didrik Pinte wrote: > From what i've heard/read, because I need transaction support on those > tables, I can't use an sqlmeta.cacheValues=True. > > Question for the list : is it (still) true ? I think it's an outdated requirement. I use transactions with cacheValues=True. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Luke O. <lu...@me...> - 2006-08-01 16:46:44
|
That restriction was based on the lack of synchronization of instance caches, applied in r1581 & a further fix in r1683. As I recall, those are both after 0.7. However, if you are *exclusively* using transactions (don't have a main plain connection whose instance cache matters), you can still use cacheValues=True, or you can manually call .sync() on objects outside of the transaction that you want to re-query the database for changes due to a transaction. - Luke P.S. Oleg, SF Patch 1524615 is the last outstanding issue with synchronization that I'm aware of, the problem it solves was introduced with the close argument to Transaction.commit(). With that applied, the docs for the next release should remove the warning about cacheValues & Transactions. Quoting Oleg Broytmann <ph...@ma...>: > On Tue, Aug 01, 2006 at 06:17:31PM +0200, Didrik Pinte wrote: >> From what i've heard/read, because I need transaction support on those >> tables, I can't use an sqlmeta.cacheValues=True. >> >> Question for the list : is it (still) true ? > > I think it's an outdated requirement. I use transactions with > cacheValues=True. > > Oleg. > -- |
From: Oleg B. <ph...@ma...> - 2006-08-01 17:11:39
|
On Tue, Aug 01, 2006 at 11:46:41AM -0500, Luke Opperman wrote: > P.S. Oleg, SF Patch 1524615 is the last outstanding issue with synchronization > that I'm aware of, the problem it solves was introduced with the close > argument to Transaction.commit(). With that applied, the docs for the next > release should remove the warning about cacheValues & Transactions. Though I had been in a hospital I read all the mail after I have returned home. Your message with the patch is now in my queue. Yesterday I started to work on SQLObject again - I did "svn update" (there were some patches in formencode/validators.py and a lot of patches in */setuptools/__init__.py, but there were no patches in sqlobject) and ran the test suite - just to test it's still passed. It is passed. Now I am going to start applying and testing patches... Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |