[Haskelldb-users] GROUP BY, unique and aggregations

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 10 Jul 2008, Justin Bailey wrote:

> By the way, another  way to write your query is:
>
> do
>        p <- table Points.points
>        unique
>        project (Points.c << p!Points.c)

>From the description of 'unique' it was not clear to me, which 'project' 
the 'unique' refers to. At first I thought, 'unique' refers to the last 
'project' before 'unique'. But then I thought, that 'project' is just like 
'return' and thus 'unique' cannot "see" it. So, is the location of 
'unique' irrelevant? Isn't it better then to make it a function of type
    unique :: Query (Rel r) -> Query (Rel r)
   which is not applied by monadic binding, but as a transformation of the 
whole query? Currently it's implemented by GROUP BY but it could also be 
implemented by DISTINCT, right?

Now I thought a bit more about grouping and aggregations. When writing 
database queries in the Query monad then I compare that with the list 
monad. Tables or multi-sets could just be seen as lists with irrelevant 
order. However, in the list monad a 'unique' function, that is applied by 
monadic bind, could not be implemented, because the monadic bind always 
feeds single elements to its second operand. But 'unique' must have access 
to the whole table in order to check for duplicates. So, is it sensible to 
have the 'unique' function how it currently is, if it can only be 
implemented in terms of query expressions but not in terms of real data 
(namely lists)?

I think a big deficiency of SQL is, that it tries to handle three types of 
query answers in the same query form. There are

r        -- scalar types as answers produced by aggregations
Rel r    -- the answer of a regular query
Rel (Rel r)
          -- an intermediate type that arises when grouping

  I hoped 'unique' would give us a way to avoid the last type, but it 
seems, that I was wrong.
  Firstly I liked, if HaskellDB would be more precise than SQL with respect 
to the types. If aggregations would have type
   (Rel r -> r)
  then it would not be possible to accidentally apply an aggregation twice 
(AVG(AVG(x2))), mix aggregations with simple column accesses (SELECT 
avg(x1), x2 FROM ...) and there would be no need to check whether the list 
returned by the query indeed consists of a single element.
  Then there are aggregations in conjunction with grouping, where they turn 
an intermediate nested relation of type (Rel (Rel a)) back into a (Rel b). 
A function which groups and aggregates may have type
   groupAndAggregate ::
         (r -> a)   -- select columns or other values to group for
      -> (a -> Rel r -> b)
                    -- aggregation for each group with access to the grouping criterion
      -> Rel r
      -> Rel b