I've found it pretty easy to get distinct objects from a resultset using a
generator:
# generator that returns distinct elements from a sorted iterable
def distinct_iter(seq):
previous = None
for item in seq:
if item!=previous:
yield item
previous = item
result = distinct_iter( klass.select(constraint, orderBy=klass._idName) )
for obj in result:
# do something with obj
However, it bothered me that there was a large upfront cost (the time
between the user clicking a button and the data being displayed) if I wanted
to know the number of distinct objects before iterating through the
resultset (e.g. to get the fraction to increment a progress bar.)
So I came up with a "cheat" using _connection.queryAll():
def selectIDs(klass, constraint, distinct=False):
connection = klass._connection
table = klass._table
idName = klass._idName
fromstr = ', '.join( constraint.tablesUsed() )
if distinct:
template = 'select distinct %s.%s from %s where %s'
else:
template = 'select %s.%s from %s where %s'
sql = template % (table, idName, fromstr, constraint)
return connection.queryAll(sql)
rows = selectIDs(klass, constraint, distinct=True)
count = len(rows)
# do something with count
for row in rows:
obj = klass(*row)
# do something with obj
For my data the upfront cost for the second method is less than a 10th of
that for the first method. Also, the total time for the second method is
much shorter for cached objects, whereas the first method doesn't seem to
get any faster.
Dave Cook
|