[Haskelldb-users] Query composition bug

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

One of the benefits of haskelldb is the ability to decompose a query
into component parts and put them together in different ways. I
believe I have a found a bug in the library that makes this hard to
do. This literate email demonstrates the bug. The full source is given
at the end, but I want to talk about the cause here. Imagine two
queries that project one aggregate column each:

tbl1  = do
  x <- table ...
  restrict ...
  project $ ... id1 << count ( ... )

tbl2 = do
   x <- table ...
   restrict ..
   project $ .. id2 << count ( ... )

When each query is run individually, the SQL produced is as expected
(i.e., "select xx as id1 from (select count(...) as xx from ..)").
When the queries are combined by hand:

cmb1 = do
  x1 <- table ..
  x2 <- table ..
  restrict ( x1 ! ...)
  restrict (x2 ! ...)
  project $ id1 << count ( ... ) # id2 << count (...)

The SQL produced is also as expected:

SELECT id13 as id1,
       id23 as id2
FROM (SELECT COUNT(id11) as id13,
             COUNT(id22) as id23
      FROM (SELECT id2 as id22
            FROM sample_table as T1
            WHERE id1 < 0) as T1,
           (SELECT id1 as id11
            FROM sample_table as T1
            WHERE id1 > 0) as T2) as T1

The aggregates are at the same level, which is very important.
However, if the individual queries are reused:

cmb2 = do
  x1 <- tbl1
  x2 <- tbl2
  project $ id1 << x1 ! id1 # id2 << x2 ! id 2

the SQL produced is NOT correct. It will execute but the aggregates
are at different levels:

SELECT id12 as id1,
       id24 as id2
FROM (SELECT COUNT(id23) as id24,
             id12
      FROM (SELECT id2 as id23
            FROM sample_table as T1
            WHERE id1 < 0) as T1,
           (SELECT COUNT(id11) as id12
            FROM (SELECT id1 as id11
                  FROM sample_table as T1) as T1
            WHERE id11 > 0) as T2
      GROUP BY id12) as T1

When this query is executed, it will return NO rows if "select ID2 as
ID23 .." returns no rows. In the correct query, 0 is returned in both
columns (as expected with aggregates). That looks like a bug to me
(though possibly its a quirk in postgres).

The same SQL is produced by this, which is definitely counter-intuitive:

cmb3 = do
  t1 <- do { x <- table ...; restrict ...; project $ id1 << count (x ! id1) }
  t2 <- do { x <- table ...; restrict ...; project $ id2 << count (x ! id2) }
  project $ id1 << t1 ! id1 #
            id2 << t2 ! id2

The bug occurs when the t2 value is constructed. The "table" operation
causes a "Binary Times" value to be placed in the PrimQuery being
built, BEFORE the aggregate projection is added. I would expect the
query for t2 to be built in isolation first, and then added via Times
to the existing PrimQuery.

This structure can be seen by loading this email into GHCi and
entering cmb1Qry, cmb2Qry, and cmb3Qry at the prompt. tbl1Qry and
tbl2Qry also show the individual queries in isolation. tbl1SQL,
tbl2SQL, etc. are provided which show the SQL generated by each
different query.

There really isn't a workaround for this bug, except to redefine
queries. I am not sure how to fix it as I tried many different
approaches. Any suggestions?

Justin

\begin{code}

import Database.HaskellDB
import Database.HaskellDB.DBLayout

tbl1 :: Query (Rel (RecCons Id1 (Expr Int) RecNil))
tbl1 = do
  t1 <- table sample_table
  restrict (t1 ! id1 .>. constant 0)
  project $ id1 << count (t1 ! id1)

tbl2 :: Query (Rel (RecCons Id2 (Expr Int) RecNil))
tbl2 = do
  t1 <- table sample_table
  restrict (t1 ! id1 .<. constant 0)
  project $ id2 << count (t1 ! id2)

cmb1 :: Query (Rel (RecCons Id1 (Expr Int) (RecCons Id2 (Expr Int) RecNil)))
cmb1 = do
  t1 <- table sample_table
  t2 <- table sample_table
  restrict (t1 ! id1 .>. constant 0)
  restrict (t2 ! id1 .<. constant 0)
  project $ id1 << count(t1 ! id1) #
            id2 << count(t2 ! id2)

cmb2 :: Query (Rel (RecCons Id1 (Expr Int) (RecCons Id2 (Expr Int) RecNil)))
cmb2 = do
    t1 <- tbl1
    t2 <- tbl2
    project $ id1 << t1 ! id1 #
              id2 << t2 ! id2

cmb3 :: Query (Rel (RecCons Id1 (Expr Int) (RecCons Id2 (Expr Int) RecNil)))
cmb3 = do
  t1 <- do { x <- table sample_table; restrict (x ! id1 .>. constant
0); project $ id1 << count (x ! id1) }
  t2 <- do { x <- table sample_table; restrict (x ! id1 .<. constant
0); project $ id2 << count (x ! id2) }
  project $ id1 << t1 ! id1 #
            id2 << t2 ! id2

tbl1SQL = putStrLn $ showSql tbl1
tbl2SQL = putStrLn $ showSql tbl2

cmb1SQL = putStrLn $ showSql cmb1
cmb2SQL = putStrLn $ showSql cmb2
cmb3SQL = putStrLn $ showSql cmb3

tbl1Qry = putStrLn $ showQuery tbl1
tbl2Qry = putStrLn $ showQuery tbl2

cmb1Qry = putStrLn $ showQuery cmb1
cmb2Qry = putStrLn $ showQuery cmb2
cmb3Qry = putStrLn $ showQuery cmb3

data Id1 = Id1
data Id2 = Id2

instance FieldTag Id1 where fieldName _ = "id1"
instance FieldTag Id2 where fieldName _ = "id2"

id1 :: Attr Id1 Int
id1 = mkAttr Id1

id2 :: Attr Id2 Int
id2 = mkAttr Id2

sample_table :: Table (RecCons Id1 (Expr Int) (RecCons Id2 (Expr Int) RecNil))
sample_table = baseTable "sample_table" $
          hdbMakeEntry Id1 #
          hdbMakeEntry Id2

\end{code}