Thread: [SQLObject] Inefficient MultipleJoin
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
|
From: Andrey L. <an...@mi...> - 2004-07-18 17:20:14
|
Hi,
I'm evaluating SQLObject and it MultipleJoin functionality works in
highly inefficient way for me. I have the following simple postgresql
database:
CREATE TABLE public.customer
(
id serial NOT NULL,
dn varchar(255) NOT NULL,
title varchar(255),
discount int2,
pricing int4,
balance float8 NOT NULL DEFAULT 0,
CONSTRAINT customer_pkey PRIMARY KEY (id),
CONSTRAINT "$1" FOREIGN KEY (pricing) REFERENCES public.pricing (id) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT customer_dn_key UNIQUE (dn)
)
CREATE TABLE public.pricing
(
id serial NOT NULL,
date date NOT NULL,
description text,
CONSTRAINT pricing_pkey PRIMARY KEY (id)
)
And theese SQLObject classes:
class Customer(SQLObject):
dn = StringCol()
balance = FloatCol()
pricing = ForeignKey('Pricing', dbName='pricing')
class Pricing(SQLObject):
date = DateCol()
description = StringCol()
customers = MultipleJoin('Customer', joinColumn='pricing')
So, now the problem:
When I do:
pricings = Pricing.select()
for prsg in pricings:
print prsg.customers
SQLObjects first fetches list of ids in 'customer' table and then executes
additional queries for EACH found id, like this:
2/QueryAll: SELECT id FROM customer WHERE pricing = 4
2/COMMIT : auto
2/QueryOne: SELECT dn, discount, balance, pricing FROM customer WHERE id = 28
2/COMMIT : auto
2/QueryOne: SELECT dn, discount, balance, pricing FROM customer WHERE id = 2
2/COMMIT : auto
2/QueryOne: SELECT dn, discount, balance, pricing FROM customer WHERE id = 187
2/COMMIT : auto
2/QueryOne: SELECT dn, discount, balance, pricing FROM customer WHERE id = 186
2/COMMIT : auto
2/QueryOne: SELECT dn, discount, balance, pricing FROM customer WHERE id = 170
2/COMMIT : auto
2/QueryOne: SELECT dn, discount, balance, pricing FROM customer WHERE id = 161
...
...
I must admit this way of doing things is terribly unscalable. Isn't it
better to do single "SELECT id, dn, discount, ... FROM customer WHERE
pricing = 4"? Or am I doing something wrong?
Thanks in advance.
--
Andrey Lebedev aka -.- . -.. -.. . .-.
Software engineer at UAB Mikromarketingas (http://micro.lt)
Homepage: http://micro.lt/~andrey/
Jabber ID: ke...@ja...
|
|
From: Charles B. <li...@st...> - 2004-07-18 18:50:49
|
Hi Andrey, > I must admit this way of doing things is terribly unscalable. Isn't it > better to do single "SELECT id, dn, discount, ... FROM customer WHERE > pricing = 4"? Or am I doing something wrong? > I think the behavior that you're observing is in fact the way that SQLObject does things. The reason being if your select were to return thousands of rows of data but you only need the first 10, 10 specific calls would be more efficient than grabbing all 1000 rows. (Especially by reusing the same connection?) I'm no database expert though... From the docs: "Select results are generators, which are lazily evaluated. So the SQL is only executed when you iterate over the select results, or if you use list() to force the result to be executed. When you iterate over the select results, rows are fetched one at a time. This way you can iterate over large results without keeping the entire result set in memory. You can also do things like .reversed() without fetching and reversing the entire result -- instead, SQLObject can change the SQL that is sent so you get equivalent results. You can also slice select results. The results are used in the SQL query, so peeps[:10] will result in LIMIT 10 being added to the end of the SQL query. If the slice cannot be performed in the SQL (e.g., peeps[:-10]), then the select is executed, and the slice is performed on the list of results. This will only happen when you use negative indexes." Hope that helps... -Charles. |
|
From: Andrey L. <an...@mi...> - 2004-07-18 20:26:41
|
On Sun, Jul 18, 2004 at 01:50:43PM -0500, Charles Brandt wrote:
> Hi Andrey,
>
> > I must admit this way of doing things is terribly unscalable. Isn't it
> > better to do single "SELECT id, dn, discount, ... FROM customer WHERE
> > pricing = 4"? Or am I doing something wrong?
> >
>
> I think the behavior that you're observing is in fact the way that SQLObject
> does things. The reason being if your select were to return thousands of
> rows of data but you only need the first 10, 10 specific calls would be more
> efficient than grabbing all 1000 rows. (Especially by reusing the same
> connection?) I'm no database expert though...
I'm no database expert too, but I feel that not always true. Each query
consists of transferring over wires, syntax parsing, planning,
optimizing, locking and unlocking tables, etc. Fetching records one by
one, generating masses of queries is not a way to go IMHO. Again, I have
no benchmark results whatsoever, just intuition.
>
> From the docs:
>
> "Select results are generators, which are lazily evaluated. So the SQL is
> only executed when you iterate over the select results, or if you use list()
> to force the result to be executed. When you iterate over the select
> results, rows are fetched one at a time. This way you can iterate over large
> results without keeping the entire result set in memory. You can also do
> things like .reversed() without fetching and reversing the entire result --
> instead, SQLObject can change the SQL that is sent so you get equivalent
> results.
> You can also slice select results. The results are used in the SQL query, so
> peeps[:10] will result in LIMIT 10 being added to the end of the SQL query.
> If the slice cannot be performed in the SQL (e.g., peeps[:-10]), then the
> select is executed, and the slice is performed on the list of results. This
> will only happen when you use negative indexes."
Well, I'm afraid that's not really a case. At least [:n] notion doesn't
work here (doesn't result in LIMIT statement). When I change driver
script to::
pricings = Pricing.select()
for prsg in pricings:
print prsg.id
print prsg.customers[:10]
script still generates lots and lots of "SELECT dn, balance, pricing
FROM customer WHERE id = 160"-like statements.
Join.py line 100:
def performJoin(self, inst):
ids = inst._connection._SO_selectJoin(
self.otherClass,
self.joinColumn,
inst.id)
if inst._SO_perConnection:
conn = inst._connection
else:
conn = None
return self._applyOrderBy([self.otherClass(id, conn) for (id,) in ids if id is not None], self.otherClass)
I can see no way where LIMIT statement or similar python limiting mechanism is
inserted. I can be wrong here of course.
I'm using SQLObject-0.5.2 and python-3.3.4
I'm currently looking at several object-relational mappers for python
and "usage" part of SQLObject is cleanly the best! But I'm a bit afraid
to base big and somewhat performance dependant project on it...
BTW, do you aware of any MSSQL Server 2000 connectors for SQLObject?
Thanks
--
Andrey Lebedev aka -.- . -.. -.. . .-.
Software engineer at UAB Mikromarketingas (http://micro.lt)
Homepage: http://micro.lt/~andrey/
Jabber ID: ke...@ja...
|