|
From: Jeremy S. <jer...@li...> - 2005-01-24 19:12:58
|
Hello,
I hacked around with the optimizer and sql generator so that many
sub-queries get rewritten as joins. I think that we should be able to
rewrite most, if not all, queries with sub-queries. (In general, not
all sql statements can be rewritten with out sub-queries, but I think
that the subset of relational algebra currently implement by haskelldb
does not have any cases that require sub-queries. )
That is good because, (1) mysql 4.0 does not support sub-queries (2)
mysql 4.1 does, but they are painfully slow. For example, the
following query takes around 10 seconds when generated as a sub-query
and 0.2 seconds with the subqueries removed. (There are around ~2000
bugs that block bug 7178).
taskQuery =
do d <- table D.dependencies
b <- table B.bugs
restrict (d!D.blocked .==. constant 7178)
restrict (b!B.bug_id .==. d!D.dependson)
project ( B.bug_id << b!B.bug_id #
B.assigned_to << b!B.assigned_to #
B.bug_status << b!B.bug_status #
B.short_desc << b!B.short_desc #
B.priority << b!B.priority
)
I have not gotten around to finishing and verifying the correctness of
this code, so it should be considered experimental. But I wanted to
get some early feedback in case I am totally screwing up ;)
If you have access to the book (hint: college library):
Database System Implementations
by Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom
http://www.amazon.com/exec/obidos/tg/detail/-/0130402648/102-9632978-2282521?v=glance
I highly recommend you read the beginnings of Chapters 6 and 7.
In Chapter 6, they defined a variation of relational algebra as on
algebra on bags instead of sets. They also extend traditional
relational algebra to include operators such as group-by, order-by,
and select distinct.
In chapter 7 they introduce the two-operator select, and a bunch of
algebraic laws for transforming queries. Note that the laws are
defined for bags, not sets.
I wonder if adopting the relational algebra defined in this book could
prove to be an important step in making haskelldb more practical for
the real world. It seems like a fairly straightward project if someone
is looking for a class project, or just has some time to kill.
The biggest 'problem' with the book is it is intended for people who
want to write a database. So, they are assuming SQL->Relational
Algebra->Optimized Physical Query Plan. We are interested in the
oppsition direction, Relation Algebra->SQL. This just means that some
of the goals of optimizing are counter-productive. For example, in
SQL->RA, it is beneficial to push the projections to the bottom
layer. But the patch I submit, I actually attempt to float the
projections to the top because it allows me to write a simlper query
that MySQL can execute faster.
Quick Overview of what the patch does:
In the SQL generator I add a new special case:
binary Times (SqlTable t1) (SqlTable t2) = newSelect { tables = [("",SqlTable t1),("",SqlTable t2)] }
This causes the generator to generate a join in the case where you
have the project of two tables.
For example,
do t1 <- table table1
t2 <- table table2
project ( <everything> ... )
would be written as:
SELECT * FROM table1, table2;
Then in the optimizer, I attempt to float projections upward, so that
all of my products are the product of two simple tables. When it
floats the projections, it will also automatically merge adjacent
projections. It think this may also have the side effect of removing
the need for removeEmpty and removeDead, but now I can't remember :) I
suspect I also disabled removeRestrict, because it will try to push
things below the product, which is the exact opposite of what I want
it most cases. I think we only want to push restrictions down, if the
query has an unremovable two operator select (and, as far as I can
tell, haskelldb does not have a two operator select).
Jeremy Shaw.
ps. This patch is against 0.8, because that is the last version that
built for me :)
* modified files
--- orig/src/Optimize.hs
+++ mod/src/Optimize.hs
@@ -15,7 +15,7 @@
-- $Revision: 1.16 $
-----------------------------------------------------------
-module Database.HaskellDB.Optimize (optimize) where
+module Database.HaskellDB.Optimize (optimize, floatProject, mergeProject) where
import Control.Exception (assert)
import Data.List (intersect)
@@ -24,9 +24,10 @@
-- | Optimize a PrimQuery
optimize :: PrimQuery -> PrimQuery
optimize = mergeProject
- . removeEmpty
- . removeDead
- . pushRestrict
+ . floatProject
+-- . removeEmpty
+-- . removeDead
+-- . pushRestrict
-- | Remove unused attributes from projections.
removeDead :: PrimQuery -> PrimQuery
@@ -160,6 +161,40 @@
safe assoc
= not (any (nestedAggregate.snd) assoc)
+floatProject :: PrimQuery -> PrimQuery
+
+-- | Does not check for conflicting names
+floatProject (Binary Times query1 query2) =
+ let query1' = mergeProject $ floatProject query1
+ query2' = mergeProject $ floatProject query2
+ in
+ case (query1',query2') of
+ ((Project assoc1 q1),(Project assoc2 q2)) -> Project (assoc1 ++ assoc2) (Binary Times q1 q2)
+ ((Project assoc1 q1),q2) -> Project assoc1 (Binary Times q1 q2)
+ (q1,(Project assoc2 q2)) -> Project assoc2 (Binary Times q1 q2)
+ (q1,q2) -> Binary Times q1 q2
+
+-- | Have not figured out what other binary ops I can float past
+floatProject (Binary op query1 query2) = Binary op (floatProject query1) (floatProject query2)
+
+floatProject (Project assoc query) = (Project assoc (floatProject query))
+
+-- | float the project past a restrict
+floatProject (Restrict x query) =
+ case floatProject query of
+ Project assoc q ->
+ let x' = substAttr assoc x
+ in
+ (Project assoc (floatProject (Restrict x' q)))
+ q -> (Restrict x q)
+
+-- | Have not figured out if/when I can float past a special op
+floatProject (Special op query) = (Special op (floatProject query))
+
+-- | Termination case
+floatProject q@(BaseTable _ _) = q
+
+
-- | Push restrictions down through projections and binary ops.
pushRestrict :: PrimQuery -> PrimQuery
--- orig/src/Sql.hs
+++ mod/src/Sql.hs
@@ -54,6 +54,7 @@
| SqlBin String SqlSelect SqlSelect
| SqlTable TableName -- ^ Select a whole table named TableName.
| SqlEmpty -- ^ Empty select.
+ deriving Show
-- | Data type representing the SQL UPDATE statement.
@@ -131,12 +132,16 @@
sql = toSelect q
-- binary assumes that q1 and q2 are not empty
- binary Times q1 q2
- | null (attrs q1) = addTable q1 q2
- | null (attrs q2) = addTable q2 q1
- | otherwise = newSelect { tables = [("",q1),("",q2)] }
- where
- addTable sql q = sql{ tables = tables sql ++ [("",q)] }
+ binary Times (SqlTable t1) (SqlTable t2) = newSelect { tables = [("",SqlTable t1),("",SqlTable t2)] }
+ binary Times q1 q2
+ | null (criteria q1) && null (criteria q2) = q1 { tables = tables q1 ++ tables q2
+ , attrs = attrs q1 ++ attrs q2
+ }
+ | null (attrs q1) = addTable q1 q2
+ | null (attrs q2) = addTable q2 q1
+ | otherwise = newSelect { tables = [("",q1),("",q2)] }
+ where
+ addTable sql q = sql{ tables = tables sql ++ [("",q)] }
binary op q1 q2
= SqlBin (toSqlOp op) q1 q2
--
This message contains information which may be confidential and privileged. Unless you are the
addressee (or authorized to receive for the addressee), you may not use, copy or disclose to anyone
the message or any information contained in the message. If you have received the message in error,
please advise the sender and delete the message. Thank you.
|