From: Peter G. <pe...@gm...> - 2008-04-03 04:48:25
|
Bjorn, I'm confused by the "unique" function. Why does it not just use SELECT DISTINCT <blah>? GROUP BY semantics are filthy. What is the intention with HaskellDB as far as safety goes? I have a query now that causes PostgreSQL to complain that I'm not doing enough GROUP BYing. I don't want to use GROUP BY at all, it's unnecessary for my query. (I actually want to use DISTINCT, but am busily hacking around needing either.) Where do these come from? I can easily send you the generated SQL but it is not easy to isolate the query itself; roughly I am trying to pass an aggregation function around and when I use it in a record construction I get this problem. Here's a plan: Add a new function: project' :: [Opt] -> ... blah, same as project where data Opt = DISTINCT | LIMIT Int | OFFSET Int I want to specify that these Opts apply to *this* projection, i.e. we need to track the column names passed to project'. I propose to do this by hacking the Project constructor in PrimQuery: data PrimQuery = ... | Project Assoc PrimQuery | ... becomes data PrimQuery = ... | Project [Opt] Assoc Assoc PrimQuery | ... The first Assoc is the specific fields the Opts apply to, the other assoc is as it is now. Hopefully the changes to SQL generation are obvious. Do you have an opinion on any of this? Is this on the right track? I definitely need LIMIT and OFFSET as they seem to be the only way to efficiently limit the result sets in HOPE. If you have other ideas, please share. Are you interested in fixing the UTF8 issues in HSQL and HaskellDB? If so, I'll send you some patches to HSQL, and later for HaskellDB. I've only hacked HSQL and HSQL-PostgreSQL, so it would need some love from someone else for the other backends. I can help with that, but have little time to do it myself. cheers pete |
From: Justin B. <jgb...@gm...> - 2008-04-03 17:08:30
|
On Wed, Apr 2, 2008 at 9:48 PM, Peter Gammie <pe...@gm...> wrote: > Bjorn, > > I'm confused by the "unique" function. Why does it not just use SELECT > DISTINCT <blah>? GROUP BY semantics are filthy. I can speak to unique, as I provided the patch which implemented it (http://tinyurl.com/2u9hmf). When I started with haskelldb, it added "distinct" to all queries. That behavior didn't work for what I wanted to do. I coded unique so 'distinct' behavior was still accessible, but wasn't the default. My experience with SQL has been to use GROUP BY over DISTINCT, which is why it generates the code it does. What makes GROUP BY filthy? > roughly I am trying to pass an aggregation function around and when I > use it in a record construction I get this problem. I ran a problem when passing around aggregates recently that didn't group quite correctly - the SQL was correct but the query result was not. It's referenced in this email (http://tinyurl.com/33apbb) - are you seeing the same thing? > > Here's a plan: Add a new function: > > project' :: [Opt] -> ... blah, same as project > > where > > data Opt = DISTINCT | LIMIT Int | OFFSET Int > > I want to specify that these Opts apply to *this* projection, i.e. we > need to track the column names passed to project'. I propose to do > this by hacking the Project constructor in PrimQuery: > > data PrimQuery = ... | Project Assoc PrimQuery | ... > > becomes > > data PrimQuery = ... | Project [Opt] Assoc Assoc PrimQuery | ... Have you considered wrangling the existing Special value to do this? It already handles "TOP" and "ORDER". However, ORDER BY seems to get propogated out to the outermost select so maybe it's not feasible. Justin |
From: Peter G. <pe...@gm...> - 2008-04-07 11:10:26
|
To kick the horse one more time: On 04/04/2008, at 12:08 AM, Justin Bailey wrote: > [...] What makes GROUP BY filthy? It forces me to enumerate fields. I can't say: SELECT * FROM ... GROUP BY something-star-like For comparison, here's a relatively-GROUP BY-free rendition of that previously-send monstrosity: SELECT ID.*, T.C FROM Items_Descriptions AS ID, (SELECT IT.tag AS T, count(IT.item_id) AS C FROM Items_Tags AS IT GROUP BY IT.tag) AS T WHERE ID.lang=$1 AND ID.item_id=T.T param $1 = 'en' As the Items_Tags table has 2 columns, I'm not too bothered about that GROUP BY. Bjorn, congrats on starting a company. cheers peter |
From: Peter G. <pe...@gm...> - 2008-04-04 05:32:37
|
Justin, On 04/04/2008, at 12:08 AM, Justin Bailey wrote: > On Wed, Apr 2, 2008 at 9:48 PM, Peter Gammie <pe...@gm...> > wrote: >> I'm confused by the "unique" function. Why does it not just use >> SELECT >> DISTINCT <blah>? GROUP BY semantics are filthy. > > I can speak to unique, as I provided the patch which implemented it > (http://tinyurl.com/2u9hmf). When I started with haskelldb, it added > "distinct" to all queries. That behavior didn't work for what I wanted > to do. I coded unique so 'distinct' behavior was still accessible, but > wasn't the default. My experience with SQL has been to use GROUP BY > over DISTINCT, which is why it generates the code it does. What makes > GROUP BY filthy? It's filthy because it wasn't working for me. :-) I'm no SQL expert so I won't argue with you further. >> roughly I am trying to pass an aggregation function around and when I >> use it in a record construction I get this problem. > > I ran a problem when passing around aggregates recently that didn't > group quite correctly - the SQL was correct but the query result was > not. It's referenced in this email (http://tinyurl.com/33apbb) - are > you seeing the same thing? Sorry, the problem has been debugged out of existence, so I no longer know what precisely the problem was. In my case the SQL was definitely incorrect. I still have that (I think) if you want to see it. I've decided to abandon HaskellDB as there are just a few things I can't get it to do, and I'm running out of time. It's easier (and perhaps no less safe?) for me to just use HSQL directly. In any case the number of database accesses will be much reduced. Bjorn: I've added parameterised executions and queries to HSQL and HSQL/PostgreSQL. Do you want these patches? One advantage is that one does not need to do any string escaping. cheers peter |
From: Justin B. <jgb...@gm...> - 2008-04-04 15:09:55
|
On Thu, Apr 3, 2008 at 10:32 PM, Peter Gammie <pe...@gm...> wrote: > Sorry, the problem has been debugged out of existence, so I no longer know > what precisely the problem was. In my case the SQL was definitely incorrect. > I still have that (I think) if you want to see it. Please! > Bjorn: I've added parameterised executions and queries to HSQL and > HSQL/PostgreSQL. Do you want these patches? One advantage is that one does > not need to do any string escaping. I just submitted a patch that allows parameters to be used in queries, but I didn't do any work on actually executing them (since I am only intersted in the generated SQL). I'm also using HDBC over HSQL. However, maybe my patch will work for you? If both are accepted I'd be willing to try and get them working together ... Justin |
From: Peter G. <pe...@gm...> - 2008-04-05 04:05:56
|
On 04/04/2008, at 10:09 PM, Justin Bailey wrote: > On Thu, Apr 3, 2008 at 10:32 PM, Peter Gammie <pe...@gm...> > wrote: >> Sorry, the problem has been debugged out of existence, so I no >> longer know >> what precisely the problem was. In my case the SQL was definitely >> incorrect. >> I still have that (I think) if you want to see it. > > Please! OK, here it is, it's huge: SELECT item_id6 as item_id, parent6 as parent, root6 as root, date_posted6 as date_posted, date_last_update6 as date_last_update, item_type6 as item_type, lang6 as lang, owner6 as owner, title6 as title, summary6 as summary, contents6 as contents, intfield6 as intfield FROM (SELECT item_id5 as item_id6, parent3 as parent6, root3 as root6, date_posted3 as date_posted6, date_last_update3 as date_last_update6, item_type5 as item_type6, lang5 as lang6, owner5 as owner6, title5 as title6, summary5 as summary6, contents5 as contents6, COUNT(item_id4) as intfield6 FROM (SELECT item_id as item_id4, tag as tag4 FROM items_tags as T1) as T1, (SELECT item_id2 as item_id3, parent1 as parent3, root1 as root3, date_posted1 as date_posted3, date_last_update1 as date_last_update3, item_type2 as item_type3, title2 as title3 FROM (SELECT item_id as item_id2, item_type as item_type2, lang as lang2, title as title2, date_posted as date_posted1, date_last_update as date_last_update1 FROM items_descriptions as T1) as T1, (SELECT item_id as item_id1, parent as parent1, root as root1 FROM items as T1) as T2 WHERE ((item_type2 = 'tag') AND (lang2 = 'en')) AND (item_id1 = item_id2)) as T2, (SELECT item_id as item_id5, item_type as item_type5, lang as lang5, owner as owner5, title as title5, summary as summary5, contents as contents5 FROM items_descriptions as T1) as T3 WHERE (item_id3 = item_id5) AND (item_id3 = tag4) GROUP BY item_id6, parent6, root6, date_posted6, date_last_update6, item_type6, lang6, owner6, title6, summary6, contents6 ORDER BY item_type3 ASC, title3 ASC) as T1 Putting this into PostgreSQL's psql, I get: ERROR: column "t2.item_type3" must appear in the GROUP BY clause or be used in an aggregate function I can send you the SQL that creates the database if you want to play with it more. cheers peter |