Thread: [Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

Brought to you by: delagoya, jcrabtree, msaffitz, stevefischer, stoeckert

gusdev-gusdev

[Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

From: Sucheta T. <su...@vb...> - 2005-02-23 15:23:10

Folks,

I don't know if any of you encountered this problem, but for us, it has 
been a major bottleneck to query the similarity and similarityspan tables 
to get the desirable output.

For example if I say a query:

$sql="select  /*+CHOOSE*/   distinct ss.query_start, 
ss.query_end,ss.is_reversed
,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 'e' 
|| TO
_CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id  from dots.similarity s
, dots.similarityspan ss, dots.transcript t, dots.externalaasequence eaa  where
t.na_feature_id=464205 and t.gene=to_char(s.query_id) and 
s.similarity_id=ss.sim
ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by 
s.similarity_
id";

It takes atleast 5 minutes. One thing to note here is here for the 
transcript corresponding to the na_feature_id is blasted against only one 
database, so the query is quite straight forward. But there are instances 
where the scaffolds are blasted aganst 23 different organisms. In that 
case, in order to obtain the value against one particular database, I use 
much nested query going and getting the value from externaldatabase and 
externaldatabaserelease and then using it in similarity table.

I was wondering if anyone has any efficient way of querying these tables.

Thanks

Sucheta

Re: [Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

From: Michael S. <msa...@pc...> - 2005-02-23 15:33:57

Hi Sucheta,

A few general query optimization points:

1) It's best to start by looking at the query execution plan, to 
determine what exactly is going on.  In Oracle, you'll need to create 
the necessary plan table, and then use the explain plan for... command. 
  Refer to Oracle's documentation for details.

2) The plan may reveal some ways in which you can speed up the query. 
For example, if the query is resulting in several full table scans of 
very large tables, it's very likely that an index on the affected 
columns will improve performance.

3) If the plan looks acceptable, but the query is still slow, then you 
may need to dig deeper.  If it's purely an issue of a complex query over 
  a large amount of data, then a materialized view may help (if the 
query is run frequently).  Otherwise, you may need to examine where the 
bottleneck is coming from-- is it cpu, disk, or something else (i.e. 
such as other user's use of the system)?

Ultimately, sometimes a query is just slow.  Our similarity table is 
many, many gigabytes (I believe upwards of 30), and any complex query on 
that amount of data is going to be slow-- there's only so much 
optimization that can occur.

Hope this helps,

Mike

Sucheta Tripathy wrote:
> Folks,
> 
> I don't know if any of you encountered this problem, but for us, it has 
> been a major bottleneck to query the similarity and similarityspan 
> tables to get the desirable output.
> 
> For example if I say a query:
> 
> $sql="select  /*+CHOOSE*/   distinct ss.query_start, 
> ss.query_end,ss.is_reversed
> ,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 
> 'e' || TO
> _CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id  from 
> dots.similarity s
> , dots.similarityspan ss, dots.transcript t, dots.externalaasequence 
> eaa  where
> t.na_feature_id=464205 and t.gene=to_char(s.query_id) and 
> s.similarity_id=ss.sim
> ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by 
> s.similarity_
> id";
> 
> It takes atleast 5 minutes. One thing to note here is here for the 
> transcript corresponding to the na_feature_id is blasted against only 
> one database, so the query is quite straight forward. But there are 
> instances where the scaffolds are blasted aganst 23 different organisms. 
> In that case, in order to obtain the value against one particular 
> database, I use much nested query going and getting the value from 
> externaldatabase and externaldatabaserelease and then using it in 
> similarity table.
> 
> I was wondering if anyone has any efficient way of querying these tables.
> 
> Thanks
> 
> Sucheta
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

Re: [Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

From: Sucheta T. <su...@vb...> - 2005-02-23 15:40:25

Hi Mike,

Thanks for the tips. In fact here at VBI, we have the DBA team helping us 
out in checking the performance issues, and after doing some research 
Sally(one of our DBA) informed me that the large dataset may be the cause 
of this. Probably one option left for us is to have materialized views on 
these tables.

Sucheta


At 10:33 AM 2/23/2005 -0500, Michael Saffitz wrote:

>Hi Sucheta,
>
>A few general query optimization points:
>
>1) It's best to start by looking at the query execution plan, to determine 
>what exactly is going on.  In Oracle, you'll need to create the necessary 
>plan table, and then use the explain plan for... command.  Refer to 
>Oracle's documentation for details.
>
>2) The plan may reveal some ways in which you can speed up the query. For 
>example, if the query is resulting in several full table scans of very 
>large tables, it's very likely that an index on the affected columns will 
>improve performance.
>
>3) If the plan looks acceptable, but the query is still slow, then you may 
>need to dig deeper.  If it's purely an issue of a complex query over  a 
>large amount of data, then a materialized view may help (if the query is 
>run frequently).  Otherwise, you may need to examine where the bottleneck 
>is coming from-- is it cpu, disk, or something else (i.e. such as other 
>user's use of the system)?
>
>Ultimately, sometimes a query is just slow.  Our similarity table is many, 
>many gigabytes (I believe upwards of 30), and any complex query on that 
>amount of data is going to be slow-- there's only so much optimization 
>that can occur.
>
>Hope this helps,
>
>Mike
>
>
>Sucheta Tripathy wrote:
>>Folks,
>>I don't know if any of you encountered this problem, but for us, it has 
>>been a major bottleneck to query the similarity and similarityspan tables 
>>to get the desirable output.
>>For example if I say a query:
>>$sql="select  /*+CHOOSE*/   distinct ss.query_start, 
>>ss.query_end,ss.is_reversed
>>,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 
>>'e' || TO
>>_CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id  from 
>>dots.similarity s
>>, dots.similarityspan ss, dots.transcript t, dots.externalaasequence 
>>eaa  where
>>t.na_feature_id=464205 and t.gene=to_char(s.query_id) and 
>>s.similarity_id=ss.sim
>>ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by 
>>s.similarity_
>>id";
>>It takes atleast 5 minutes. One thing to note here is here for the 
>>transcript corresponding to the na_feature_id is blasted against only one 
>>database, so the query is quite straight forward. But there are instances 
>>where the scaffolds are blasted aganst 23 different organisms. In that 
>>case, in order to obtain the value against one particular database, I use 
>>much nested query going and getting the value from externaldatabase and 
>>externaldatabaserelease and then using it in similarity table.
>>I was wondering if anyone has any efficient way of querying these tables.
>>Thanks
>>Sucheta
>>
>>-------------------------------------------------------
>>SF email is sponsored by - The IT Product Guide
>>Read honest & candid reviews on hundreds of IT Products from real users.
>>Discover which products truly live up to the hype. Start reading now.
>>http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>>_______________________________________________
>>Gusdev-gusdev mailing list
>>Gus...@li...
>>https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

Re: [Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

From: Angel P. <an...@ma...> - 2005-02-23 15:40:10

Attachments: show-explain.sql

Sucheta,
It is always a good idea to include an the output of "EXPLAIN PLAN" when 
asking how to optimize queries. Usually well-placed indexes will take 
care of long running queries.

*EXPLAIN PLAN* [*SET STATEMENT_ID* = 'text']

[*INTO* [schema .] table [@ dblink]] *FOR* statement;

Output defaults to a plan table in your schema, unless you give it a 
plan table to insert into. Oracle has a script for creating this table, 
|UTLXPLAN.SQL|
Attached is a SQL script to output the results from the plan table.

Postgres must have some similar functionality.
Angel

Sucheta Tripathy wrote:

> Folks,
>
> I don't know if any of you encountered this problem, but for us, it 
> has been a major bottleneck to query the similarity and similarityspan 
> tables to get the desirable output.
>
> For example if I say a query:
>
> $sql="select  /*+CHOOSE*/   distinct ss.query_start, 
> ss.query_end,ss.is_reversed
> ,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 
> 'e' || TO
> _CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id  from 
> dots.similarity s
> , dots.similarityspan ss, dots.transcript t, dots.externalaasequence 
> eaa  where
> t.na_feature_id=464205 and t.gene=to_char(s.query_id) and 
> s.similarity_id=ss.sim
> ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by 
> s.similarity_
> id";
>
> It takes atleast 5 minutes. One thing to note here is here for the 
> transcript corresponding to the na_feature_id is blasted against only 
> one database, so the query is quite straight forward. But there are 
> instances where the scaffolds are blasted aganst 23 different 
> organisms. In that case, in order to obtain the value against one 
> particular database, I use much nested query going and getting the 
> value from externaldatabase and externaldatabaserelease and then using 
> it in similarity table.
>
> I was wondering if anyone has any efficient way of querying these tables.
>
> Thanks
>
> Sucheta
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev



-- 

Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160

P: 215-573-3736
F: 215-573-9004