From: Jeremy S. <jer...@li...> - 2005-01-24 19:12:58
|
Hello, I hacked around with the optimizer and sql generator so that many sub-queries get rewritten as joins. I think that we should be able to rewrite most, if not all, queries with sub-queries. (In general, not all sql statements can be rewritten with out sub-queries, but I think that the subset of relational algebra currently implement by haskelldb does not have any cases that require sub-queries. ) That is good because, (1) mysql 4.0 does not support sub-queries (2) mysql 4.1 does, but they are painfully slow. For example, the following query takes around 10 seconds when generated as a sub-query and 0.2 seconds with the subqueries removed. (There are around ~2000 bugs that block bug 7178). taskQuery = do d <- table D.dependencies b <- table B.bugs restrict (d!D.blocked .==. constant 7178) restrict (b!B.bug_id .==. d!D.dependson) project ( B.bug_id << b!B.bug_id # B.assigned_to << b!B.assigned_to # B.bug_status << b!B.bug_status # B.short_desc << b!B.short_desc # B.priority << b!B.priority ) I have not gotten around to finishing and verifying the correctness of this code, so it should be considered experimental. But I wanted to get some early feedback in case I am totally screwing up ;) If you have access to the book (hint: college library): Database System Implementations by Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom http://www.amazon.com/exec/obidos/tg/detail/-/0130402648/102-9632978-2282521?v=glance I highly recommend you read the beginnings of Chapters 6 and 7. In Chapter 6, they defined a variation of relational algebra as on algebra on bags instead of sets. They also extend traditional relational algebra to include operators such as group-by, order-by, and select distinct. In chapter 7 they introduce the two-operator select, and a bunch of algebraic laws for transforming queries. Note that the laws are defined for bags, not sets. I wonder if adopting the relational algebra defined in this book could prove to be an important step in making haskelldb more practical for the real world. It seems like a fairly straightward project if someone is looking for a class project, or just has some time to kill. The biggest 'problem' with the book is it is intended for people who want to write a database. So, they are assuming SQL->Relational Algebra->Optimized Physical Query Plan. We are interested in the oppsition direction, Relation Algebra->SQL. This just means that some of the goals of optimizing are counter-productive. For example, in SQL->RA, it is beneficial to push the projections to the bottom layer. But the patch I submit, I actually attempt to float the projections to the top because it allows me to write a simlper query that MySQL can execute faster. Quick Overview of what the patch does: In the SQL generator I add a new special case: binary Times (SqlTable t1) (SqlTable t2) = newSelect { tables = [("",SqlTable t1),("",SqlTable t2)] } This causes the generator to generate a join in the case where you have the project of two tables. For example, do t1 <- table table1 t2 <- table table2 project ( <everything> ... ) would be written as: SELECT * FROM table1, table2; Then in the optimizer, I attempt to float projections upward, so that all of my products are the product of two simple tables. When it floats the projections, it will also automatically merge adjacent projections. It think this may also have the side effect of removing the need for removeEmpty and removeDead, but now I can't remember :) I suspect I also disabled removeRestrict, because it will try to push things below the product, which is the exact opposite of what I want it most cases. I think we only want to push restrictions down, if the query has an unremovable two operator select (and, as far as I can tell, haskelldb does not have a two operator select). Jeremy Shaw. ps. This patch is against 0.8, because that is the last version that built for me :) * modified files --- orig/src/Optimize.hs +++ mod/src/Optimize.hs @@ -15,7 +15,7 @@ -- $Revision: 1.16 $ ----------------------------------------------------------- -module Database.HaskellDB.Optimize (optimize) where +module Database.HaskellDB.Optimize (optimize, floatProject, mergeProject) where import Control.Exception (assert) import Data.List (intersect) @@ -24,9 +24,10 @@ -- | Optimize a PrimQuery optimize :: PrimQuery -> PrimQuery optimize = mergeProject - . removeEmpty - . removeDead - . pushRestrict + . floatProject +-- . removeEmpty +-- . removeDead +-- . pushRestrict -- | Remove unused attributes from projections. removeDead :: PrimQuery -> PrimQuery @@ -160,6 +161,40 @@ safe assoc = not (any (nestedAggregate.snd) assoc) +floatProject :: PrimQuery -> PrimQuery + +-- | Does not check for conflicting names +floatProject (Binary Times query1 query2) = + let query1' = mergeProject $ floatProject query1 + query2' = mergeProject $ floatProject query2 + in + case (query1',query2') of + ((Project assoc1 q1),(Project assoc2 q2)) -> Project (assoc1 ++ assoc2) (Binary Times q1 q2) + ((Project assoc1 q1),q2) -> Project assoc1 (Binary Times q1 q2) + (q1,(Project assoc2 q2)) -> Project assoc2 (Binary Times q1 q2) + (q1,q2) -> Binary Times q1 q2 + +-- | Have not figured out what other binary ops I can float past +floatProject (Binary op query1 query2) = Binary op (floatProject query1) (floatProject query2) + +floatProject (Project assoc query) = (Project assoc (floatProject query)) + +-- | float the project past a restrict +floatProject (Restrict x query) = + case floatProject query of + Project assoc q -> + let x' = substAttr assoc x + in + (Project assoc (floatProject (Restrict x' q))) + q -> (Restrict x q) + +-- | Have not figured out if/when I can float past a special op +floatProject (Special op query) = (Special op (floatProject query)) + +-- | Termination case +floatProject q@(BaseTable _ _) = q + + -- | Push restrictions down through projections and binary ops. pushRestrict :: PrimQuery -> PrimQuery --- orig/src/Sql.hs +++ mod/src/Sql.hs @@ -54,6 +54,7 @@ | SqlBin String SqlSelect SqlSelect | SqlTable TableName -- ^ Select a whole table named TableName. | SqlEmpty -- ^ Empty select. + deriving Show -- | Data type representing the SQL UPDATE statement. @@ -131,12 +132,16 @@ sql = toSelect q -- binary assumes that q1 and q2 are not empty - binary Times q1 q2 - | null (attrs q1) = addTable q1 q2 - | null (attrs q2) = addTable q2 q1 - | otherwise = newSelect { tables = [("",q1),("",q2)] } - where - addTable sql q = sql{ tables = tables sql ++ [("",q)] } + binary Times (SqlTable t1) (SqlTable t2) = newSelect { tables = [("",SqlTable t1),("",SqlTable t2)] } + binary Times q1 q2 + | null (criteria q1) && null (criteria q2) = q1 { tables = tables q1 ++ tables q2 + , attrs = attrs q1 ++ attrs q2 + } + | null (attrs q1) = addTable q1 q2 + | null (attrs q2) = addTable q2 q1 + | otherwise = newSelect { tables = [("",q1),("",q2)] } + where + addTable sql q = sql{ tables = tables sql ++ [("",q)] } binary op q1 q2 = SqlBin (toSqlOp op) q1 q2 -- This message contains information which may be confidential and privileged. Unless you are the addressee (or authorized to receive for the addressee), you may not use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender and delete the message. Thank you. |