Re: [Haskelldb-users] HaskellDB efficience

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Krasimir Angelov wrote:
> --- Bjorn Bringert <d00...@dt...> wrote:
> 
>>Hi Krasimir,
>>
>> the problem is in the code that you use to print the query. You are
>> not actually seeing the query that gets executed since the query is
>> not optimized yet. If you use Database.HaskellDB.showSql, you will
>> see the optimized query.
>>
>>/Bjorn
> 
> 
> Thanks! With showSql the things are much better. Now I
> got:
> 
> SELECT DISTINCT docDate
> FROM  documents as T1
> 
> This is better but why is there DISTINCT keyword. Can
> I get the query without it? The SQL script is even
> worse when I have two tables in query:

No, there is no way to not get DISTINCT in the query. This was a design 
choice made in the original HaskellDB, presumably to be closer to 
relational algebra where relations do not contain duplicate records. If 
you want to return duplicate records, you can include some unique value 
(probably the key) for each record in the results.

> [****** Haskell ******]
> import Database.HaskellDB
> import Database.HaskellDB.Query
> import Database.HaskellDB.Sql
> import Database.HaskellDB.GenericConnect
> import Text.PrettyPrint
> import Testdb.Documents
> 
> main = do
>   let primQuery = do
> 	       docs1 <- table documents
> 	       docs2 <- table documents
> 	       restrict (docs1!xid .==. docs2!xid)
> 	       project (docDate << docs1!docDate #
> 	                docDate << docs2!docDate)
>   putStrLn (render (showSql primQuery))

There is a problem with this query, the results contain the docDate 
field twice. Unfortunately the record type system currently does not 
detect this error. However, by using a technique from HList presented at 
HW2004, such a check could probably be implemented.

> [****** SQL ******]
> SELECT DISTINCT docDate1 as docDate,
>                 docDate1 as docDate
> FROM  (SELECT DISTINCT id as id2,
>                        docDate as docDate2
>        FROM  documents as T1) as T1,
>       (SELECT DISTINCT id as id1,
>                        docDate as docDate1
>        FROM  documents as T1) as T2
> WHERE (id1 = id2)
> 
> There are two derived tables, three distincts and one
> cross join. Can I get the optimal query?
> 
> SELECT T1.docDate as docDate1,
>        T1.docDate as docDate2
> FROM documents as T1
> JOIN documents as T2 on T1.id = T2.id

HaskellDB currently does not produce JOIN clauses. Doing so might be 
possible, but could be difficult in the general case. There is probably 
a lot of room for improvement in the SQL optimization, have a look at 
the code for Database.HaskellDB.Optimize if you want to play with it. 
OTOH, the RDBMS also does optimization, so it might not always be clear 
that client-side optimizations are worthwhile.

/Bjorn