Thread: [Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

Status: Alpha

Brought to you by: ahockersten, bring, el_pollo, m4dc4p, viblo

haskelldb-users

[Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

From: Peter G. <pe...@gm...> - 2008-04-03 04:48:25

Bjorn,

I'm confused by the "unique" function. Why does it not just use SELECT  
DISTINCT <blah>? GROUP BY semantics are filthy.

What is the intention with HaskellDB as far as safety goes? I have a  
query now that causes PostgreSQL to complain that I'm not doing enough  
GROUP BYing. I don't want to use GROUP BY at all, it's unnecessary for  
my query. (I actually want to use DISTINCT, but am busily hacking  
around needing either.) Where do these come from? I can easily send  
you the generated SQL but it is not easy to isolate the query itself;  
roughly I am trying to pass an aggregation function around and when I  
use it in a record construction I get this problem.

Here's a plan: Add a new function:

project' :: [Opt] -> ... blah, same as project

where

data Opt = DISTINCT | LIMIT Int | OFFSET Int

I want to specify that these Opts apply to *this* projection, i.e. we  
need to track the column names passed to project'. I propose to do  
this by hacking the Project constructor in PrimQuery:

data PrimQuery = ... | Project Assoc PrimQuery | ...

becomes

data PrimQuery = ... | Project [Opt] Assoc Assoc PrimQuery | ...

The first Assoc is the specific fields the Opts apply to, the other  
assoc is as it is now. Hopefully the changes to SQL generation are  
obvious.

Do you have an opinion on any of this? Is this on the right track? I  
definitely need LIMIT and OFFSET as they seem to be the only way to  
efficiently limit the result sets in HOPE. If you have other ideas,  
please share.

Are you interested in fixing the UTF8 issues in HSQL and HaskellDB? If  
so, I'll send you some patches to HSQL, and later for HaskellDB. I've  
only hacked HSQL and HSQL-PostgreSQL, so it would need some love from  
someone else for the other backends. I can help with that, but have  
little time to do it myself.

cheers
pete

Re: [Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

From: Justin B. <jgb...@gm...> - 2008-04-03 17:08:30

On Wed, Apr 2, 2008 at 9:48 PM, Peter Gammie <pe...@gm...> wrote:
> Bjorn,
>
>  I'm confused by the "unique" function. Why does it not just use SELECT
>  DISTINCT <blah>? GROUP BY semantics are filthy.

I can speak to unique, as I provided the patch which implemented it
(http://tinyurl.com/2u9hmf). When I started with haskelldb, it added
"distinct" to all queries. That behavior didn't work for what I wanted
to do. I coded unique so 'distinct' behavior was still accessible, but
wasn't the default. My experience with SQL has been to use GROUP BY
over DISTINCT, which is why it generates the code it does. What makes
GROUP BY filthy?

>  roughly I am trying to pass an aggregation function around and when I
>  use it in a record construction I get this problem.

I ran a problem when passing around aggregates recently that didn't
group quite correctly - the SQL was correct but the query result was
not. It's referenced in this email (http://tinyurl.com/33apbb) - are
you seeing the same thing?

>
>  Here's a plan: Add a new function:
>
>  project' :: [Opt] -> ... blah, same as project
>
>  where
>
>  data Opt = DISTINCT | LIMIT Int | OFFSET Int
>
>  I want to specify that these Opts apply to *this* projection, i.e. we
>  need to track the column names passed to project'. I propose to do
>  this by hacking the Project constructor in PrimQuery:
>
>  data PrimQuery = ... | Project Assoc PrimQuery | ...
>
>  becomes
>
>  data PrimQuery = ... | Project [Opt] Assoc Assoc PrimQuery | ...

Have you considered wrangling the existing Special value to do this?
It already handles "TOP" and "ORDER". However, ORDER BY seems to get
propogated out to the outermost select so maybe it's not feasible.

Justin

Re: [Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

From: Peter G. <pe...@gm...> - 2008-04-07 11:10:26

To kick the horse one more time:

On 04/04/2008, at 12:08 AM, Justin Bailey wrote:

> [...] What makes GROUP BY filthy?

It forces me to enumerate fields. I can't say:

SELECT * FROM ... GROUP BY something-star-like

For comparison, here's a relatively-GROUP BY-free rendition of that  
previously-send monstrosity:

SELECT ID.*, T.C FROM Items_Descriptions AS ID,
         (SELECT IT.tag AS T, count(IT.item_id) AS C
             FROM Items_Tags AS IT
            GROUP BY IT.tag) AS T
         WHERE ID.lang=$1
           AND ID.item_id=T.T

param $1 = 'en'

As the Items_Tags table has 2 columns, I'm not too bothered about that  
GROUP BY.

Bjorn, congrats on starting a company.

cheers
peter

Re: [Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

From: Peter G. <pe...@gm...> - 2008-04-04 05:32:37

Justin,

On 04/04/2008, at 12:08 AM, Justin Bailey wrote:

> On Wed, Apr 2, 2008 at 9:48 PM, Peter Gammie <pe...@gm...>  
> wrote:
>> I'm confused by the "unique" function. Why does it not just use  
>> SELECT
>> DISTINCT <blah>? GROUP BY semantics are filthy.
>
> I can speak to unique, as I provided the patch which implemented it
> (http://tinyurl.com/2u9hmf). When I started with haskelldb, it added
> "distinct" to all queries. That behavior didn't work for what I wanted
> to do. I coded unique so 'distinct' behavior was still accessible, but
> wasn't the default. My experience with SQL has been to use GROUP BY
> over DISTINCT, which is why it generates the code it does. What makes
> GROUP BY filthy?

It's filthy because it wasn't working for me. :-)

I'm no SQL expert so I won't argue with you further.

>> roughly I am trying to pass an aggregation function around and when I
>> use it in a record construction I get this problem.
>
> I ran a problem when passing around aggregates recently that didn't
> group quite correctly - the SQL was correct but the query result was
> not. It's referenced in this email (http://tinyurl.com/33apbb) - are
> you seeing the same thing?

Sorry, the problem has been debugged out of existence, so I no longer  
know what precisely the problem was. In my case the SQL was definitely  
incorrect. I still have that (I think) if you want to see it.

I've decided to abandon HaskellDB as there are just a few things I  
can't get it to do, and I'm running out of time. It's easier (and  
perhaps no less safe?) for me to just use HSQL directly. In any case  
the number of database accesses will be much reduced.

Bjorn: I've added parameterised executions and queries to HSQL and  
HSQL/PostgreSQL. Do you want these patches? One advantage is that one  
does not need to do any string escaping.

cheers
peter

Re: [Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

From: Justin B. <jgb...@gm...> - 2008-04-04 15:09:55

On Thu, Apr 3, 2008 at 10:32 PM, Peter Gammie <pe...@gm...> wrote:
>  Sorry, the problem has been debugged out of existence, so I no longer know
> what precisely the problem was. In my case the SQL was definitely incorrect.
> I still have that (I think) if you want to see it.

Please!

>  Bjorn: I've added parameterised executions and queries to HSQL and
> HSQL/PostgreSQL. Do you want these patches? One advantage is that one does
> not need to do any string escaping.

I just submitted a patch that allows parameters to be used in queries,
but I didn't do any work on actually executing them (since I am only
intersted in the generated SQL). I'm also using HDBC over HSQL.
However, maybe my patch will work for you?

If both are accepted I'd be willing to try and get them working together ...

Justin

Re: [Haskelldb-users] haskelldb, LIMIT, OFFSET, DISTINCT

From: Peter G. <pe...@gm...> - 2008-04-05 04:05:56

On 04/04/2008, at 10:09 PM, Justin Bailey wrote:

> On Thu, Apr 3, 2008 at 10:32 PM, Peter Gammie <pe...@gm...>  
> wrote:
>> Sorry, the problem has been debugged out of existence, so I no  
>> longer know
>> what precisely the problem was. In my case the SQL was definitely  
>> incorrect.
>> I still have that (I think) if you want to see it.
>
> Please!

OK, here it is, it's huge:

SELECT item_id6 as item_id,
        parent6 as parent,
        root6 as root,
        date_posted6 as date_posted,
        date_last_update6 as date_last_update,
        item_type6 as item_type,
        lang6 as lang,
        owner6 as owner,
        title6 as title,
        summary6 as summary,
        contents6 as contents,
        intfield6 as intfield
FROM (SELECT item_id5 as item_id6,
              parent3 as parent6,
              root3 as root6,
              date_posted3 as date_posted6,
              date_last_update3 as date_last_update6,
              item_type5 as item_type6,
              lang5 as lang6,
              owner5 as owner6,
              title5 as title6,
              summary5 as summary6,
              contents5 as contents6,
              COUNT(item_id4) as intfield6
       FROM (SELECT item_id as item_id4,
                    tag as tag4
             FROM items_tags as T1) as T1,
            (SELECT item_id2 as item_id3,
                    parent1 as parent3,
                    root1 as root3,
                    date_posted1 as date_posted3,
                    date_last_update1 as date_last_update3,
                    item_type2 as item_type3,
                    title2 as title3
             FROM (SELECT item_id as item_id2,
                          item_type as item_type2,
                          lang as lang2,
                          title as title2,
                          date_posted as date_posted1,
                          date_last_update as date_last_update1
                   FROM items_descriptions as T1) as T1,
                  (SELECT item_id as item_id1,
                          parent as parent1,
                          root as root1
                   FROM items as T1) as T2
             WHERE ((item_type2 = 'tag') AND (lang2 = 'en')) AND  
(item_id1 = item_id2)) as T2,
            (SELECT item_id as item_id5,
                    item_type as item_type5,
                    lang as lang5,
                    owner as owner5,
                    title as title5,
                    summary as summary5,
                    contents as contents5
             FROM items_descriptions as T1) as T3
       WHERE (item_id3 = item_id5) AND (item_id3 = tag4)
       GROUP BY item_id6,
                parent6,
                root6,
                date_posted6,
                date_last_update6,
                item_type6,
                lang6,
                owner6,
                title6,
                summary6,
                contents6
       ORDER BY item_type3 ASC,
                title3 ASC) as T1

Putting this into PostgreSQL's psql, I get:

ERROR:  column "t2.item_type3" must appear in the GROUP BY clause or  
be used in an aggregate function

I can send you the SQL that creates the database if you want to play  
with it more.

cheers
peter