Thread: [SQLObject] getting distinct objects and counting them

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've found it pretty easy to get distinct objects from a resultset using a
generator:

  # generator that returns distinct elements from a sorted iterable
  def distinct_iter(seq):
      previous = None
      for item in seq:
          if item!=previous:
              yield item
          previous = item

  result = distinct_iter( klass.select(constraint, orderBy=klass._idName) )
  for obj in result:
      # do something with obj

However, it bothered me that there was a large upfront cost (the time
between the user clicking a button and the data being displayed) if I wanted
to know the number of distinct objects before iterating through the
resultset (e.g. to get the fraction to increment a progress bar.) 

So I came up with a "cheat" using _connection.queryAll():

  def selectIDs(klass, constraint, distinct=False):
      connection = klass._connection
      table = klass._table
      idName = klass._idName
      fromstr = ', '.join( constraint.tablesUsed() )
      if distinct:
          template = 'select distinct %s.%s from %s where %s'
      else:
          template = 'select %s.%s from %s where %s'
      sql = template % (table, idName, fromstr, constraint)
      return connection.queryAll(sql)

  rows = selectIDs(klass, constraint, distinct=True)
  count = len(rows)
  # do something with count
  for row in rows:
      obj = klass(*row)
      # do something with obj

For my data the upfront cost for the second method is less than a 10th of
that for the first method.  Also, the total time for the second method is
much shorter for cached objects, whereas the first method doesn't seem to
get any faster.  

Dave Cook

Thread: [SQLObject] getting distinct objects and counting them

SQLObject is a Python ORM.

sqlobject-discuss