From: Sucheta T. <su...@vb...> - 2005-02-23 15:23:10
|
Folks, I don't know if any of you encountered this problem, but for us, it has been a major bottleneck to query the similarity and similarityspan tables to get the desirable output. For example if I say a query: $sql="select /*+CHOOSE*/ distinct ss.query_start, ss.query_end,ss.is_reversed ,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || 'e' || TO _CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id from dots.similarity s , dots.similarityspan ss, dots.transcript t, dots.externalaasequence eaa where t.na_feature_id=464205 and t.gene=to_char(s.query_id) and s.similarity_id=ss.sim ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by s.similarity_ id"; It takes atleast 5 minutes. One thing to note here is here for the transcript corresponding to the na_feature_id is blasted against only one database, so the query is quite straight forward. But there are instances where the scaffolds are blasted aganst 23 different organisms. In that case, in order to obtain the value against one particular database, I use much nested query going and getting the value from externaldatabase and externaldatabaserelease and then using it in similarity table. I was wondering if anyone has any efficient way of querying these tables. Thanks Sucheta |
From: Michael S. <msa...@pc...> - 2005-02-23 15:33:57
|
Hi Sucheta, A few general query optimization points: 1) It's best to start by looking at the query execution plan, to determine what exactly is going on. In Oracle, you'll need to create the necessary plan table, and then use the explain plan for... command. Refer to Oracle's documentation for details. 2) The plan may reveal some ways in which you can speed up the query. For example, if the query is resulting in several full table scans of very large tables, it's very likely that an index on the affected columns will improve performance. 3) If the plan looks acceptable, but the query is still slow, then you may need to dig deeper. If it's purely an issue of a complex query over a large amount of data, then a materialized view may help (if the query is run frequently). Otherwise, you may need to examine where the bottleneck is coming from-- is it cpu, disk, or something else (i.e. such as other user's use of the system)? Ultimately, sometimes a query is just slow. Our similarity table is many, many gigabytes (I believe upwards of 30), and any complex query on that amount of data is going to be slow-- there's only so much optimization that can occur. Hope this helps, Mike Sucheta Tripathy wrote: > Folks, > > I don't know if any of you encountered this problem, but for us, it has > been a major bottleneck to query the similarity and similarityspan > tables to get the desirable output. > > For example if I say a query: > > $sql="select /*+CHOOSE*/ distinct ss.query_start, > ss.query_end,ss.is_reversed > ,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || > 'e' || TO > _CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id from > dots.similarity s > , dots.similarityspan ss, dots.transcript t, dots.externalaasequence > eaa where > t.na_feature_id=464205 and t.gene=to_char(s.query_id) and > s.similarity_id=ss.sim > ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by > s.similarity_ > id"; > > It takes atleast 5 minutes. One thing to note here is here for the > transcript corresponding to the na_feature_id is blasted against only > one database, so the query is quite straight forward. But there are > instances where the scaffolds are blasted aganst 23 different organisms. > In that case, in order to obtain the value against one particular > database, I use much nested query going and getting the value from > externaldatabase and externaldatabaserelease and then using it in > similarity table. > > I was wondering if anyone has any efficient way of querying these tables. > > Thanks > > Sucheta > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Sucheta T. <su...@vb...> - 2005-02-23 15:40:25
|
Hi Mike, Thanks for the tips. In fact here at VBI, we have the DBA team helping us out in checking the performance issues, and after doing some research Sally(one of our DBA) informed me that the large dataset may be the cause of this. Probably one option left for us is to have materialized views on these tables. Sucheta At 10:33 AM 2/23/2005 -0500, Michael Saffitz wrote: >Hi Sucheta, > >A few general query optimization points: > >1) It's best to start by looking at the query execution plan, to determine >what exactly is going on. In Oracle, you'll need to create the necessary >plan table, and then use the explain plan for... command. Refer to >Oracle's documentation for details. > >2) The plan may reveal some ways in which you can speed up the query. For >example, if the query is resulting in several full table scans of very >large tables, it's very likely that an index on the affected columns will >improve performance. > >3) If the plan looks acceptable, but the query is still slow, then you may >need to dig deeper. If it's purely an issue of a complex query over a >large amount of data, then a materialized view may help (if the query is >run frequently). Otherwise, you may need to examine where the bottleneck >is coming from-- is it cpu, disk, or something else (i.e. such as other >user's use of the system)? > >Ultimately, sometimes a query is just slow. Our similarity table is many, >many gigabytes (I believe upwards of 30), and any complex query on that >amount of data is going to be slow-- there's only so much optimization >that can occur. > >Hope this helps, > >Mike > > >Sucheta Tripathy wrote: >>Folks, >>I don't know if any of you encountered this problem, but for us, it has >>been a major bottleneck to query the similarity and similarityspan tables >>to get the desirable output. >>For example if I say a query: >>$sql="select /*+CHOOSE*/ distinct ss.query_start, >>ss.query_end,ss.is_reversed >>,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || >>'e' || TO >>_CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id from >>dots.similarity s >>, dots.similarityspan ss, dots.transcript t, dots.externalaasequence >>eaa where >>t.na_feature_id=464205 and t.gene=to_char(s.query_id) and >>s.similarity_id=ss.sim >>ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by >>s.similarity_ >>id"; >>It takes atleast 5 minutes. One thing to note here is here for the >>transcript corresponding to the na_feature_id is blasted against only one >>database, so the query is quite straight forward. But there are instances >>where the scaffolds are blasted aganst 23 different organisms. In that >>case, in order to obtain the value against one particular database, I use >>much nested query going and getting the value from externaldatabase and >>externaldatabaserelease and then using it in similarity table. >>I was wondering if anyone has any efficient way of querying these tables. >>Thanks >>Sucheta >> >>------------------------------------------------------- >>SF email is sponsored by - The IT Product Guide >>Read honest & candid reviews on hundreds of IT Products from real users. >>Discover which products truly live up to the hype. Start reading now. >>http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >>_______________________________________________ >>Gusdev-gusdev mailing list >>Gus...@li... >>https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Angel P. <an...@ma...> - 2005-02-23 15:40:10
Attachments:
show-explain.sql
|
Sucheta, It is always a good idea to include an the output of "EXPLAIN PLAN" when asking how to optimize queries. Usually well-placed indexes will take care of long running queries. *EXPLAIN PLAN* [*SET STATEMENT_ID* = 'text'] [*INTO* [schema .] table [@ dblink]] *FOR* statement; Output defaults to a plan table in your schema, unless you give it a plan table to insert into. Oracle has a script for creating this table, |UTLXPLAN.SQL| Attached is a SQL script to output the results from the plan table. Postgres must have some similar functionality. Angel Sucheta Tripathy wrote: > Folks, > > I don't know if any of you encountered this problem, but for us, it > has been a major bottleneck to query the similarity and similarityspan > tables to get the desirable output. > > For example if I say a query: > > $sql="select /*+CHOOSE*/ distinct ss.query_start, > ss.query_end,ss.is_reversed > ,ss.reading_frame,s.subject_id, to_char(s.pvalue_mant,'FM9990.099') || > 'e' || TO > _CHAR(s.pvalue_exp,'FMS999'), eaa.name, s.similarity_id from > dots.similarity s > , dots.similarityspan ss, dots.transcript t, dots.externalaasequence > eaa where > t.na_feature_id=464205 and t.gene=to_char(s.query_id) and > s.similarity_id=ss.sim > ilarity_id and s.subject_id=eaa.source_id and rownum < 10 order by > s.similarity_ > id"; > > It takes atleast 5 minutes. One thing to note here is here for the > transcript corresponding to the na_feature_id is blasted against only > one database, so the query is quite straight forward. But there are > instances where the scaffolds are blasted aganst 23 different > organisms. In that case, in order to obtain the value against one > particular database, I use much nested query going and getting the > value from externaldatabase and externaldatabaserelease and then using > it in similarity table. > > I was wondering if anyone has any efficient way of querying these tables. > > Thanks > > Sucheta > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |