Thread: [SQLObject] DISTINCT and COUNT
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Marcin W. <wo...@un...> - 2004-09-10 10:01:27
|
Hi, I needed to answer question like this: -- How many people have at least one address with given zip SELECT COUNT(DISTINCT person.id) FROM person, address WHERE address.person_id =3D person.id AND address.zip =3D 50482 I modified main.py with applied distinct patch and it worked (with MySQL)= . Perhaps someone will find it useful or will find an error in it: def count(self): """ Counting elements of current select results """ - count =3D self.accumulate('COUNT(*)') + if self.ops.get('distinct'): + count =3D self.accumulate('COUNT(DISTINCT %s.%s)' % ( + self.sourceClass._table, + self.sourceClass._idName)) + else: + count =3D self.accumulate('COUNT(*)') if self.ops.get('start'): count -=3D self.ops['start'] BTW, are there any plans to include distinct and index patches to SQLObje= ct? BTW 2, in docs/News.txt it is written about cascade: The constraints are only implemented in the DBMS, not in SQLObject (i.e., they will not work in databases like MySQL and SQLite). but they seem to be implemented also in SQLObject and they work in MySQL. At least I hope so :-). I rely on it. Marcin --=20 Marcin Wojdyr | http://www.unipress.waw.pl/~wojdyr |
From: Jeremy F. <je...@go...> - 2004-09-10 22:02:45
|
On Fri, 2004-09-10 at 12:05 +0200, Marcin Wojdyr wrote: > Hi, > I needed to answer question like this: > > -- How many people have at least one address with given zip > SELECT COUNT(DISTINCT person.id) FROM person, address > WHERE address.person_id = person.id AND address.zip = 50482 I thought about this, but SQLite doesn't support the count(distinct ...) syntax - you need to use a sub-select. So I punted and left it to stuff which should be common to all databases. > BTW, are there any plans to include distinct and index patches to SQLObject? I haven't got any feedback about them. I guess I should stick them in the SF patch manager... J |
From: Ian B. <ia...@co...> - 2004-09-13 23:01:10
|
Jeremy Fitzhardinge wrote: > On Fri, 2004-09-10 at 12:05 +0200, Marcin Wojdyr wrote: > >>Hi, >>I needed to answer question like this: >> >>-- How many people have at least one address with given zip >>SELECT COUNT(DISTINCT person.id) FROM person, address >>WHERE address.person_id = person.id AND address.zip = 50482 > > > I thought about this, but SQLite doesn't support the count(distinct ...) > syntax - you need to use a sub-select. So I punted and left it to stuff > which should be common to all databases. > > >>BTW, are there any plans to include distinct and index patches to SQLObject? > > > I haven't got any feedback about them. I guess I should stick them in > the SF patch manager... Sorry, I've been moving and haven't had good internet access. I'm still a little confused about what distinct really means. No two rows can be indistinct if you are including the primary key in the select. So what does distinct accomplish? If you could select a subset of the columns, then distinct would be useful. That could be useful, probably as another method (e.g., MyClass.selectColumns(['firstName', 'lastName'])); and there distinct would make sense. The index patch looks good and makes sense to me. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Jeremy F. <je...@go...> - 2004-09-13 23:31:57
Attachments:
distinct-select.patch
|
On Mon, 2004-09-13 at 18:01 -0500, Ian Bicking wrote: > I'm still a little confused about what distinct really means. No two > rows can be indistinct if you are including the primary key in the > select. So what does distinct accomplish? If you could select a subset > of the columns, then distinct would be useful. That could be useful, > probably as another method (e.g., MyClass.selectColumns(['firstName', > 'lastName'])); and there distinct would make sense. Well, one example is when joining. I had a join table with a particular pairing appearing multiple times, so when joining with it, I got the corresponding other tables multiple times. Also, I think I've observed MySQL returning a single row multiple times if you use a select statement containing an OR, and the row matches multiple predicates (but I may be mistaken). One deficiency in this patch is that there's no way to apply "distinct" to SQLObject joins (ones done via join columns). The other, as Marcin pointed out, that it doesn't apply to count(), but that seems to require larger structural changes. Also, this patch copies a little too much from the way reverse() works. Reverse is its own opposite, so it makes sense for select().reverse(). reverse() to be a no-op, but distinct() shouldn't toggle like this (updated patch attached). > The index patch looks good and makes sense to me. Great, thanks. We still need to create indexes on SQLObject's automatically created join tables. J |
From: Andrew B. <and...@pu...> - 2004-09-14 00:53:27
|
On Mon, Sep 13, 2004 at 06:01:09PM -0500, Ian Bicking wrote: [...] > > I'm still a little confused about what distinct really means. No two > rows can be indistinct if you are including the primary key in the > select. So what does distinct accomplish? If you could select a subset > of the columns, then distinct would be useful. That could be useful, > probably as another method (e.g., MyClass.selectColumns(['firstName', > 'lastName'])); and there distinct would make sense. That would be good too. But distinct on all columns can still be useful if you're doing a join that might include the same row multiple times. -Andrew. |
From: David M. C. <da...@da...> - 2004-09-14 06:57:25
|
On Mon, Sep 13, 2004 at 06:01:09PM -0500, Ian Bicking wrote: > I'm still a little confused about what distinct really means. No two > rows can be indistinct if you are including the primary key in the > select. So what does distinct accomplish? If you could select a subset I think you're thinking of distinct in the sense of distinguishable. SQL distinct returns a set of rows without duplications like the unix uniq command. This is most useful when combined with count(distinct) (or the sqlite workaround of using a subselect) when you need a count of distinct rows in a set without the upfront cost of iterating through the whole set and removing duplicates. For example, res = Composer.select(Work.q.composerID==Composer.q.id) returns a set of musical composers who have musical works in the database. It takes about 1.9 seconds on my machine to iterate through the resultset to get the actual count of distinct composers (the additional time to keep track of duplicates in a dict is not significant), which makes a big difference in the responsiveness (or lack thereof) of an interactive application. Dave Cook |
From: Ian B. <ia...@co...> - 2004-09-14 15:49:37
|
David M. Cook wrote: > On Mon, Sep 13, 2004 at 06:01:09PM -0500, Ian Bicking wrote: > > >>I'm still a little confused about what distinct really means. No two >>rows can be indistinct if you are including the primary key in the >>select. So what does distinct accomplish? If you could select a subset > > > I think you're thinking of distinct in the sense of distinguishable. SQL > distinct returns a set of rows without duplications like the unix uniq > command. This is most useful when combined with count(distinct) (or the > sqlite workaround of using a subselect) when you need a count of distinct rows > in a set without the upfront cost of iterating through the whole set and > removing duplicates. For example, > > res = Composer.select(Work.q.composerID==Composer.q.id) OK, that makes more sense to me. It feels a little odd still; it would be most natural to do this with a query like: SELECT composer.all columns... FROM composer WHERE EXISTS (SELECT * FROM work WHERE work.composer_id = composer.id); Not all databases can do that, of course (particularly MySQL), but maybe it would be better to make the SELECT DISTINCT a workaround, and expose something like EXISTS. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: David M. C. <da...@da...> - 2004-09-14 23:58:47
|
On Tue, Sep 14, 2004 at 10:49:37AM -0500, Ian Bicking wrote: > SELECT composer.all columns... FROM composer WHERE EXISTS (SELECT * FROM > work WHERE work.composer_id = composer.id); Using IN works as well, and is probably more universal (tested in postgres and sqlite) SELECT id FROM composer WHERE id IN (SELECT composer_id FROM work) Dave Cook |
From: David M. C. <da...@da...> - 2004-09-15 03:18:29
|
On Tue, Sep 14, 2004 at 04:58:40PM -0700, David M. Cook wrote: > SELECT id FROM composer WHERE id IN (SELECT composer_id FROM work) Got it! r = Composer.select(IN(Composer.q.id, [SQLBuilder.Select(Work.q.composerID)])) In [87]: r.clause Out[87]: (composer.id IN (SELECT work.composer_id FROM work)) In [88]: r.count() Out[88]: 1420 Dave Cook |