Re: [Bacula-devel] Postgresql pickiness again

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Tuesday 22 August 2006 15:22, Bill Moran wrote:
> In response to Kern Sibbald <ke...@si...>:
> 
> > On Tuesday 22 August 2006 14:23, Dan Langille wrote:
> > > On 22 Aug 2006 at 13:57, Kern Sibbald wrote:
> > > 
> > > > Thanks, I found a solution -- unfortunately to make it "PostgreSQL 
> > (POSIX?)" 
> > > > compliant meant making the query less efficient as it returns twice as 
> > much 
> > > > data as I need ...
> > > 
> > > Twice as many rows?  Duplicates?
> > 
> > Twice as many columns. PostgreSQL as opposed to MySQL forces me to return 
a 
> > field (column) I do not want:
> > 
> > const char *sql_jobids_from_client =
> >    "SELECT DISTINCT Job.JobId,Job.StartTime FROM Job,Pool"
> >    " WHERE Client.Name='%s' AND Pool.Name='%s' AND Job.PoolId=Pool.PoolId"
> >    " AND Job.ClientId=Client.ClientId "
> >    " ORDER by Job.StartTime";
> > 
> > I don't care to see StartTime, only that the JobId is sorted on it.
> > 
> > In MySQL,
> > 
> > const char *sql_jobids_from_client =
> >    "SELECT DISTINCT Job.JobId FROM Job,Pool"
> >    " WHERE Client.Name='%s' AND Pool.Name='%s' AND Job.PoolId=Pool.PoolId"
> >    " AND Job.ClientId=Client.ClientId "
> >    " ORDER by Job.StartTime";
> > 
> > works fine, and I am not smart enough to understand the nuances of why 
> > PostgreSQL (or POSIX or whoever) requires the column be returned in order 
to 
> > have is sorted.  Someone explained it, but for my feeble brain, it is just 
> > not logical, and this what make programming "correct" SQL far more 
difficult 
> > than programming C (for me).  If there are two JobIds with the same 
> > StartTime, which is never the case, I want the SQL to simply take the one 
> > with the smaller StartTime and throw the second one away (but I know that 
> > will never happen and unfortunately "correct" SQL as far as I can see has 
no 
> > clean way for me to do that -- a incomplete design decision IMO).
> 
> It's ANSI SQL (SQL99, I believe it's called) that's dictating this.  

Yes, thanks.

> My 
> understanding is how can you sort on something you don't use? 

Easy.  Here are the instructions:  Get records with all the following 
conditions xx and yy and zz, ...;  sort on StartTime; remove all duplicate 
records of JobId (this is called "uniq" in *nix scripting land); return 
JobId.  That's what I want to do. 

ANSI SQL makes it impossible to do exactly as I want -- or at least, let's say 
that no one has proposed a solution other than I *must* also get back 
StartTime (and maybe you have some idea with multiple selects below).

> In 
> PostgreSQL, the planner is going to analyze its information on the table
> to determine whether it makes more sense to sort the table first, then
> apply the distinct, or apply the distinct, then the sort.  Real world
> example:
> bacula=> explain select distinct job.jobid,job.starttime from 
job,pool,client
> bacula-> where client.name='mindwipe-fd' and pool.name='OffSiteRotation' and 
job.poolid=pool.poolid
> bacula-> AND Job.ClientId=Client.ClientId ORDER by Job.StartTime;
>                                      QUERY PLAN
> 
------------------------------------------------------------------------------------
>  Unique  (cost=20.93..20.95 rows=2 width=12)
>    ->  Sort  (cost=20.93..20.93 rows=2 width=12)
>          Sort Key: job.starttime, job.jobid
>          ->  Nested Loop  (cost=1.28..20.92 rows=2 width=12)
>                Join Filter: ("inner".poolid = "outer".poolid)
>                ->  Seq Scan on pool  (cost=0.00..1.09 rows=1 width=4)
>                      Filter: (name = 'OffSiteRotation'::text)
>                ->  Hash Join  (cost=1.28..19.61 rows=18 width=16)
>                      Hash Cond: ("outer".clientid = "inner".clientid)
>                      ->  Seq Scan on job  (cost=0.00..16.10 rows=410 
width=20)
>                      ->  Hash  (cost=1.27..1.27 rows=1 width=4)
>                            ->  Seq Scan on client  (cost=0.00..1.27 rows=1 
width=4)
>                                  Filter: (name = 'mindwipe-fd'::text)
> (13 rows)
> 
> Notice that the unique is the last thing PostgreSQL does.  Makes sense, but
> what if the contents of the table were different in a way that it would be
> faster to apply the distinct first -- then it would be impossible to sort.
> 
> My instinct in PostgreSQL would be to alter the query to contain a 
subselect,

Thanks, but that's more complicated than I want to get right now.

> which would return the correct # of columns.  Does MySQL support subselects
> yet?

Yes, for a long time now (at least several years).

> 
> -- 
> Bill Moran
> Collaborative Fusion Inc.
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job 
easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Bacula-devel mailing list
> Bac...@li...
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>