|
From: Henning T. <has...@he...> - 2008-07-10 21:24:10
|
On Thu, 10 Jul 2008, Justin Bailey wrote:
> By the way, another way to write your query is:
>
> do
> p <- table Points.points
> unique
> project (Points.c << p!Points.c)
>From the description of 'unique' it was not clear to me, which 'project'
the 'unique' refers to. At first I thought, 'unique' refers to the last
'project' before 'unique'. But then I thought, that 'project' is just like
'return' and thus 'unique' cannot "see" it. So, is the location of
'unique' irrelevant? Isn't it better then to make it a function of type
unique :: Query (Rel r) -> Query (Rel r)
which is not applied by monadic binding, but as a transformation of the
whole query? Currently it's implemented by GROUP BY but it could also be
implemented by DISTINCT, right?
Now I thought a bit more about grouping and aggregations. When writing
database queries in the Query monad then I compare that with the list
monad. Tables or multi-sets could just be seen as lists with irrelevant
order. However, in the list monad a 'unique' function, that is applied by
monadic bind, could not be implemented, because the monadic bind always
feeds single elements to its second operand. But 'unique' must have access
to the whole table in order to check for duplicates. So, is it sensible to
have the 'unique' function how it currently is, if it can only be
implemented in terms of query expressions but not in terms of real data
(namely lists)?
I think a big deficiency of SQL is, that it tries to handle three types of
query answers in the same query form. There are
r -- scalar types as answers produced by aggregations
Rel r -- the answer of a regular query
Rel (Rel r)
-- an intermediate type that arises when grouping
I hoped 'unique' would give us a way to avoid the last type, but it
seems, that I was wrong.
Firstly I liked, if HaskellDB would be more precise than SQL with respect
to the types. If aggregations would have type
(Rel r -> r)
then it would not be possible to accidentally apply an aggregation twice
(AVG(AVG(x2))), mix aggregations with simple column accesses (SELECT
avg(x1), x2 FROM ...) and there would be no need to check whether the list
returned by the query indeed consists of a single element.
Then there are aggregations in conjunction with grouping, where they turn
an intermediate nested relation of type (Rel (Rel a)) back into a (Rel b).
A function which groups and aggregates may have type
groupAndAggregate ::
(r -> a) -- select columns or other values to group for
-> (a -> Rel r -> b)
-- aggregation for each group with access to the grouping criterion
-> Rel r
-> Rel b
|