Re: [Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Sucheta,

A few general query optimization points:

1) It's best to start by looking at the query execution plan, to 
determine what exactly is going on.  In Oracle, you'll need to create 
the necessary plan table, and then use the explain plan for... command. 
  Refer to Oracle's documentation for details.

2) The plan may reveal some ways in which you can speed up the query. 
For example, if the query is resulting in several full table scans of 
very large tables, it's very likely that an index on the affected 
columns will improve performance.

3) If the plan looks acceptable, but the query is still slow, then you 
may need to dig deeper.  If it's purely an issue of a complex query over 
  a large amount of data, then a materialized view may help (if the 
query is run frequently).  Otherwise, you may need to examine where the 
bottleneck is coming from-- is it cpu, disk, or something else (i.e. 
such as other user's use of the system)?

Ultimately, sometimes a query is just slow.  Our similarity table is 
many, many gigabytes (I believe upwards of 30), and any complex query on 
that amount of data is going to be slow-- there's only so much 
optimization that can occur.

Hope this helps,

Mike

Sucheta Tripathy wrote:
> Folks,
> 
> I don't know if any of you encountered this problem, but for us, it has 
> been a major bottleneck to query the similarity and similarityspan 
> tables to get the desirable output.
> 
> For example if I say a query:
> 
> $sql="select  /*+CHOOSE*/   distinct ss.query_start, 
> ss.query_end,ss.is_reversed
> ,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 
> 'e' || TO
> _CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id  from 
> dots.similarity s
> , dots.similarityspan ss, dots.transcript t, dots.externalaasequence 
> eaa  where
> t.na_feature_id=464205 and t.gene=to_char(s.query_id) and 
> s.similarity_id=ss.sim
> ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by 
> s.similarity_
> id";
> 
> It takes atleast 5 minutes. One thing to note here is here for the 
> transcript corresponding to the na_feature_id is blasted against only 
> one database, so the query is quite straight forward. But there are 
> instances where the scaffolds are blasted aganst 23 different organisms. 
> In that case, in order to obtain the value against one particular 
> database, I use much nested query going and getting the value from 
> externaldatabase and externaldatabaserelease and then using it in 
> similarity table.
> 
> I was wondering if anyone has any efficient way of querying these tables.
> 
> Thanks
> 
> Sucheta
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev