Re: [Gusdev-gusdev] performance issues in querying dots.similarity/dots.similarityspan tables

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Mike,

Thanks for the tips. In fact here at VBI, we have the DBA team helping us 
out in checking the performance issues, and after doing some research 
Sally(one of our DBA) informed me that the large dataset may be the cause 
of this. Probably one option left for us is to have materialized views on 
these tables.

Sucheta

At 10:33 AM 2/23/2005 -0500, Michael Saffitz wrote:

>Hi Sucheta,
>
>A few general query optimization points:
>
>1) It's best to start by looking at the query execution plan, to determine 
>what exactly is going on.  In Oracle, you'll need to create the necessary 
>plan table, and then use the explain plan for... command.  Refer to 
>Oracle's documentation for details.
>
>2) The plan may reveal some ways in which you can speed up the query. For 
>example, if the query is resulting in several full table scans of very 
>large tables, it's very likely that an index on the affected columns will 
>improve performance.
>
>3) If the plan looks acceptable, but the query is still slow, then you may 
>need to dig deeper.  If it's purely an issue of a complex query over  a 
>large amount of data, then a materialized view may help (if the query is 
>run frequently).  Otherwise, you may need to examine where the bottleneck 
>is coming from-- is it cpu, disk, or something else (i.e. such as other 
>user's use of the system)?
>
>Ultimately, sometimes a query is just slow.  Our similarity table is many, 
>many gigabytes (I believe upwards of 30), and any complex query on that 
>amount of data is going to be slow-- there's only so much optimization 
>that can occur.
>
>Hope this helps,
>
>Mike
>
>
>Sucheta Tripathy wrote:
>>Folks,
>>I don't know if any of you encountered this problem, but for us, it has 
>>been a major bottleneck to query the similarity and similarityspan tables 
>>to get the desirable output.
>>For example if I say a query:
>>$sql="select  /*+CHOOSE*/   distinct ss.query_start, 
>>ss.query_end,ss.is_reversed
>>,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 
>>'e' || TO
>>_CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id  from 
>>dots.similarity s
>>, dots.similarityspan ss, dots.transcript t, dots.externalaasequence 
>>eaa  where
>>t.na_feature_id=464205 and t.gene=to_char(s.query_id) and 
>>s.similarity_id=ss.sim
>>ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by 
>>s.similarity_
>>id";
>>It takes atleast 5 minutes. One thing to note here is here for the 
>>transcript corresponding to the na_feature_id is blasted against only one 
>>database, so the query is quite straight forward. But there are instances 
>>where the scaffolds are blasted aganst 23 different organisms. In that 
>>case, in order to obtain the value against one particular database, I use 
>>much nested query going and getting the value from externaldatabase and 
>>externaldatabaserelease and then using it in similarity table.
>>I was wondering if anyone has any efficient way of querying these tables.
>>Thanks
>>Sucheta
>>
>>-------------------------------------------------------
>>SF email is sponsored by - The IT Product Guide
>>Read honest & candid reviews on hundreds of IT Products from real users.
>>Discover which products truly live up to the hype. Start reading now.
>>http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>>_______________________________________________
>>Gusdev-gusdev mailing list
>>Gus...@li...
>>https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev