From: Steve F. <sfi...@pc...> - 2005-02-15 02:17:32
|
alberto- you're right. the similarity tables in gus capture the essence of the=20 similarity, not the details. am i correct in thinking that the information you are describing is in a=20 1-1 relationship with a SimilaritySpan? If so, you could prototype your idea by adding a table called=20 SimilaritySpanDetails to your gus. It would have a link to SimilaritySpa= n. steve davila wrote: >Hi Steve, >=20 >I wonder to know if you think it would be interesting to expand the "Sim= ilarity and SimilaritySpan" tables ? Some blast results,=20 >eg: query_string, hit_string, homology_string and alignment don=C2=B4t a= ppear to be represented in those tables (of course, I might be wrong)... >=20 >Ideally, those tables should be able to store most data parsed from Blas= t results, an example of most important data is listed in the Bio::Search= IO system of Bioperl: http://bioperl.org/HOWTOs/SearchIO/use.html >=20 >Cheers, Alberto >=20 > > -----Mensagem original-----=20 > De: Steve Fischer [mailto:sfi...@pc...]=20 > Enviada: seg 14/2/2005 17:53=20 > Para: Poliana Mateus=20 > Cc: davila; gus...@li...=20 > Assunto: Re: [Gusdev-gusdev] parseBlastFilesForSimilarity.pl >=09 >=09 > Poliana- >=09 > the only blast plugins we have are LoadBlastSimFast and > LoadBlastSimilarityPK. >=09 > the only tables are Similarity and SimilaritySpan >=09 > steve >=09 > Poliana Mateus wrote: >=09 > >Hi Steve > > > >I need to insert given in the GUS (resulted blast) as: > > > >---------------------------------------------------- > >extracted data of ours script > >---------------------------------------------------- > >query_name > >name > >accession > >description > >significance > >raw_score > >length > >num_identical > >frac_identical > >num_conserved > >frac_conserved > >start('query') > >end('query') > >start('hit') > >end('hit') > >---------------------------------------------------- > > > >Analyzing the LoadBlastSimFast Plugin I verified that it inserts in > >tables DoTs.Similarity and DoTs.SymilaritySpan, both only accept given > >numerics. > >Exists into GUS other tables that store resulted of Blast? > > > >Poliana > > > > > > > > > > > > > >On Fri, 11 Feb 2005 13:50:32 -0500, Steve Fischer > ><sfi...@pc...> wrote: > >=20 > > > >>see below > >> > >>Alberto Davila wrote: > >> > >> =20 > >> > >>>We are doing this for Garsa (another system) .. basically we have a > >>>bioperl parser (Bio::Search::IO) that reads the Blast results file a= nd > >>>extract all the needed info (to the "Blast_Hit" table)... and also l= oad > >>>into a given table (eg: External_DB) all the sequences (in fasta for= mat) > >>>presenting similarity with the queries... at the end we have "Blast_= Hit" > >>>and "External_DB" populated with the same script. > >>> > >>> > >>> > >>> =20 > >>> > >>wow, great. could you make a gus plugin from that? > >> > >> =20 > >> > >>>Regarding Interpro and Glimmer, the main problem is to know in which > >>>tables we should load the parsed results ? > >>> > >>> > >>> > >>> =20 > >>> > >>describe the info you want to store. > >> > >>steve > >> > >> =20 > >> > >>>Alberto > >>> > >>>On Fri, 2005-02-11 at 13:21 -0500, Y. Thomas Gan wrote: > >>> > >>> > >>> =20 > >>> > >>>>I was going to give the same answer steve gave for interpro and gen= e > >>>>finding results. > >>>> > >>>>For loading sequences into GUS, the dillema with option 2 is: how d= o you > >>>>know which sequence to load when you load (which is before you actu= ally > >>>>have the similarity result)? One solution would be to initially loa= d > >>>>complete dataset(s) but delete those without similarity after loadi= ng > >>>>similarity results. > >>>> > >>>>-Thomas > >>>> > >>>>On Fri, 11 Feb 2005, Steve Fischer wrote: > >>>> > >>>> > >>>> > >>>> =20 > >>>> > >>>>>alberto- > >>>>> > >>>>>we've never loaded interpro, so there isn't a plugin. > >>>>>i believe plasmodb has loaded glimmer results, though i'm not sure= . i have > >>>>>asked a plasmodb developer to answer that question. > >>>>> > >>>>>steve > >>>>> > >>>>>Alberto Davila wrote: > >>>>> > >>>>> > >>>>> > >>>>> =20 > >>>>> > >>>>>>Hey Steve, Thomas, > >>>>>> > >>>>>>Thanks a lot for the tips, really helpful.. now, few more questio= ns: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> =20 > >>>>>> > >>>>>>>ok. NR =3D NRDB > >>>>>>> > >>>>>>>the way we have used gus with similarities is that both the quer= y and > >>>>>>>subject are loaded into gus. As thomas explained, the similarit= y table > >>>>>>>captures similarity between sequences that are in gus. > >>>>>>>our approach has always been to just load (warehouse) the entire= subject > >>>>>>>database (NR, EST) that we are blasting against. > >>>>>>> > >>>>>>>the current plugins and blastSimilarity are set up for this. > >>>>>>> > >>>>>>>obviously, this takes a lot of disk space. two major efficienci= es that we > >>>>>>>don't currently have plugins for would be: > >>>>>>>1. to only store in gus a *reference* to the external sequence (= ie, don't > >>>>>>>store the actgs). > >>>>>>>2. only store in gus the sequences that actually have similariti= es > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> =20 > >>>>>>> > >>>>>>Option 2 sound better for us, since we will be blasting against s= everal > >>>>>>databases (> 10GB databases) > >>>>>> > >>>>>>What about the plugins to load Interpro and "gene finder" (glimme= r, etc) > >>>>>>results ? Is there any at all ? > >>>>>> > >>>>>>Cheers, Alberto > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> =20 > >>>>>> > >>>>>>>steve > >>>>>>> > >>>>>>>Alberto Davila wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> =20 > >>>>>>> > >>>>>>>>All the blastable databases I mentioned are standard databases = from NCBI > >>>>>>>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): > >>>>>>>> > >>>>>>>>NT =3D nucleotides > >>>>>>>> > >>>>>>>>~30000 entries from genbank (genbank format) are loaded into GU= S now. > >>>>>>>> > >>>>>>>>Not sure about your "NRDB", I know NR from NCBI that is a colle= ction of > >>>>>>>>aminoacid entries, could it be the same ? > >>>>>>>> > >>>>>>>>Alberto > >>>>>>>> > >>>>>>>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> =20 > >>>>>>>> > >>>>>>>>>(what is NT?) > >>>>>>>>> > >>>>>>>>>which of these (genbank, your fasta, NRDB, NT, EST) have you l= oaded into > >>>>>>>>>gus? > >>>>>>>>> > >>>>>>>>>steve > >>>>>>>>> > >>>>>>>>>Alberto Davila wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> =20 > >>>>>>>>> > >>>>>>>>>>Query: > >>>>>>>>>> > >>>>>>>>>>Either sequences from genbank (genbank format) or sequences g= enerated > >>>>>>>>>>in > >>>>>>>>>>the lab (fasta format) > >>>>>>>>>> > >>>>>>>>>>Blastable databases (all are formatted databases from NCBI): > >>>>>>>>>> > >>>>>>>>>>NR > >>>>>>>>>>NT > >>>>>>>>>>EST > >>>>>>>>>> > >>>>>>>>>>Alberto > >>>>>>>>>> > >>>>>>>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> =20 > >>>>>>>>>> > >>>>>>>>>>>for the blast, what are the query sequences and what are the= blastable > >>>>>>>>>>>databases? > >>>>>>>>>>> > >>>>>>>>>>>steve > >>>>>>>>>>> > >>>>>>>>>>>Alberto Davila wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> =20 > >>>>>>>>>>> > >>>>>>>>>>>>Basically we will use sequences (loaded into GUS with the G= BParser) > >>>>>>>>>>>>for > >>>>>>>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences= will be > >>>>>>>>>>>>also > >>>>>>>>>>>>used for Interpro analyses. Results of both (Blast and Inte= rpro) will > >>>>>>>>>>>>be > >>>>>>>>>>>>loaded into GUS. We will parse specific things from the Bla= st > >>>>>>>>>>>>results, I > >>>>>>>>>>>>would say: > >>>>>>>>>>>> > >>>>>>>>>>>>`Gi` `Accession` `Description` `E_value` `Score` `Length` > >>>>>>>>>>>>`Frame_Query` `Frame_Hit` `Identical` `Hsp_Frac_Identical` > >>>>>>>>>>>>`Conserved` `Hsp_Frac_Conserved` > >>>>>>>>>>>>`Query_Start` > >>>>>>>>>>>>`Query_End` `Hit_Start` `Hit_End` `Hsp_Align` `database_let= ters` > >>>>>>>>>>>>`database_entries` > >>>>>>>>>>>>We already have a Bioperl parser for that (specific for ano= ther > >>>>>>>>>>>>system: > >>>>>>>>>>>>GARSA) that could be adapted to GUS, problem being we are n= ot sure > >>>>>>>>>>>>what > >>>>>>>>>>>>tables should be used to store those data in GUS. > >>>>>>>>>>>> > >>>>>>>>>>>>Cheers, Alberto > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> =20 > >>>>>>>>>>>> > >>>>>>>>>>>>>what are you planning on blasting? > >>>>>>>>>>>>> > >>>>>>>>>>>>>steve > >>>>>>>>>>>>> > >>>>>>>>>>>>>Alberto Davila wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> =20 > >>>>>>>>>>>>> > >>>>>>>>>>>>>>Hi Steve, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> =20 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>poliana- > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of= date. > >>>>>>>>>>>>>>>it should instruct you to use the blastSimilarity comman= d. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subjec= t and > >>>>>>>>>>>>>>>query sequences are in GUS, and their def. lines have GU= S primary > >>>>>>>>>>>>>>>keys. > >>>>>>>>>>>>>>>Are your sequences already loaded into GUS? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> =20 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>They are not, there would be any howto/tips for that plug= in ? We > >>>>>>>>>>>>>>will > >>>>>>>>>>>>>>certainly need a plugin to load "Interpro" and "ORF findi= ng" > >>>>>>>>>>>>>>results > >>>>>>>>>>>>>>into GUS... If they are not available, then maybe we will= have to > >>>>>>>>>>>>>>write > >>>>>>>>>>>>>>them ... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>Cheers, Alberto > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> =20 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>steve > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>Poliana Mateus wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> =20 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>Hello all, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.= pl?? > >>>>>>>>>>>>>>>>I'm trying to run LoadBlastSimFast... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>Poliana > >>>>>>>>>>>>>>>> >=09 > > =20 > |