From: Steve F. <sfi...@pc...> - 2005-02-14 20:53:37
|
Poliana- the only blast plugins we have are LoadBlastSimFast and LoadBlastSimilarityPK. the only tables are Similarity and SimilaritySpan steve Poliana Mateus wrote: >Hi Steve > >I need to insert given in the GUS (resulted blast) as: > >---------------------------------------------------- >extracted data of ours script >---------------------------------------------------- >query_name >name >accession >description >significance >raw_score >length >num_identical >frac_identical >num_conserved >frac_conserved >start('query') >end('query') >start('hit') >end('hit') >---------------------------------------------------- > >Analyzing the LoadBlastSimFast Plugin I verified that it inserts in >tables DoTs.Similarity and DoTs.SymilaritySpan, both only accept given >numerics. >Exists into GUS other tables that store resulted of Blast? > >Poliana > > > > > > >On Fri, 11 Feb 2005 13:50:32 -0500, Steve Fischer ><sfi...@pc...> wrote: > > >>see below >> >>Alberto Davila wrote: >> >> >> >>>We are doing this for Garsa (another system) .. basically we have a >>>bioperl parser (Bio::Search::IO) that reads the Blast results file and >>>extract all the needed info (to the "Blast_Hit" table)... and also load >>>into a given table (eg: External_DB) all the sequences (in fasta format) >>>presenting similarity with the queries... at the end we have "Blast_Hit" >>>and "External_DB" populated with the same script. >>> >>> >>> >>> >>> >>wow, great. could you make a gus plugin from that? >> >> >> >>>Regarding Interpro and Glimmer, the main problem is to know in which >>>tables we should load the parsed results ? >>> >>> >>> >>> >>> >>describe the info you want to store. >> >>steve >> >> >> >>>Alberto >>> >>>On Fri, 2005-02-11 at 13:21 -0500, Y. Thomas Gan wrote: >>> >>> >>> >>> >>>>I was going to give the same answer steve gave for interpro and gene >>>>finding results. >>>> >>>>For loading sequences into GUS, the dillema with option 2 is: how do you >>>>know which sequence to load when you load (which is before you actually >>>>have the similarity result)? One solution would be to initially load >>>>complete dataset(s) but delete those without similarity after loading >>>>similarity results. >>>> >>>>-Thomas >>>> >>>>On Fri, 11 Feb 2005, Steve Fischer wrote: >>>> >>>> >>>> >>>> >>>> >>>>>alberto- >>>>> >>>>>we've never loaded interpro, so there isn't a plugin. >>>>>i believe plasmodb has loaded glimmer results, though i'm not sure. i have >>>>>asked a plasmodb developer to answer that question. >>>>> >>>>>steve >>>>> >>>>>Alberto Davila wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hey Steve, Thomas, >>>>>> >>>>>>Thanks a lot for the tips, really helpful.. now, few more questions: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>ok. NR = NRDB >>>>>>> >>>>>>>the way we have used gus with similarities is that both the query and >>>>>>>subject are loaded into gus. As thomas explained, the similarity table >>>>>>>captures similarity between sequences that are in gus. >>>>>>>our approach has always been to just load (warehouse) the entire subject >>>>>>>database (NR, EST) that we are blasting against. >>>>>>> >>>>>>>the current plugins and blastSimilarity are set up for this. >>>>>>> >>>>>>>obviously, this takes a lot of disk space. two major efficiencies that we >>>>>>>don't currently have plugins for would be: >>>>>>>1. to only store in gus a *reference* to the external sequence (ie, don't >>>>>>>store the actgs). >>>>>>>2. only store in gus the sequences that actually have similarities >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Option 2 sound better for us, since we will be blasting against several >>>>>>databases (> 10GB databases) >>>>>> >>>>>>What about the plugins to load Interpro and "gene finder" (glimmer, etc) >>>>>>results ? Is there any at all ? >>>>>> >>>>>>Cheers, Alberto >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>steve >>>>>>> >>>>>>>Alberto Davila wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>All the blastable databases I mentioned are standard databases from NCBI >>>>>>>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): >>>>>>>> >>>>>>>>NT = nucleotides >>>>>>>> >>>>>>>>~30000 entries from genbank (genbank format) are loaded into GUS now. >>>>>>>> >>>>>>>>Not sure about your "NRDB", I know NR from NCBI that is a collection of >>>>>>>>aminoacid entries, could it be the same ? >>>>>>>> >>>>>>>>Alberto >>>>>>>> >>>>>>>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>(what is NT?) >>>>>>>>> >>>>>>>>>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into >>>>>>>>>gus? >>>>>>>>> >>>>>>>>>steve >>>>>>>>> >>>>>>>>>Alberto Davila wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>Query: >>>>>>>>>> >>>>>>>>>>Either sequences from genbank (genbank format) or sequences generated >>>>>>>>>>in >>>>>>>>>>the lab (fasta format) >>>>>>>>>> >>>>>>>>>>Blastable databases (all are formatted databases from NCBI): >>>>>>>>>> >>>>>>>>>>NR >>>>>>>>>>NT >>>>>>>>>>EST >>>>>>>>>> >>>>>>>>>>Alberto >>>>>>>>>> >>>>>>>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>for the blast, what are the query sequences and what are the blastable >>>>>>>>>>>databases? >>>>>>>>>>> >>>>>>>>>>>steve >>>>>>>>>>> >>>>>>>>>>>Alberto Davila wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>Basically we will use sequences (loaded into GUS with the GBParser) >>>>>>>>>>>>for >>>>>>>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be >>>>>>>>>>>>also >>>>>>>>>>>>used for Interpro analyses. Results of both (Blast and Interpro) will >>>>>>>>>>>>be >>>>>>>>>>>>loaded into GUS. We will parse specific things from the Blast >>>>>>>>>>>>results, I >>>>>>>>>>>>would say: >>>>>>>>>>>> >>>>>>>>>>>>`Gi` `Accession` `Description` `E_value` `Score` `Length` >>>>>>>>>>>>`Frame_Query` `Frame_Hit` `Identical` `Hsp_Frac_Identical` >>>>>>>>>>>>`Conserved` `Hsp_Frac_Conserved` >>>>>>>>>>>>`Query_Start` >>>>>>>>>>>>`Query_End` `Hit_Start` `Hit_End` `Hsp_Align` `database_letters` >>>>>>>>>>>>`database_entries` >>>>>>>>>>>>We already have a Bioperl parser for that (specific for another >>>>>>>>>>>>system: >>>>>>>>>>>>GARSA) that could be adapted to GUS, problem being we are not sure >>>>>>>>>>>>what >>>>>>>>>>>>tables should be used to store those data in GUS. >>>>>>>>>>>> >>>>>>>>>>>>Cheers, Alberto >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>what are you planning on blasting? >>>>>>>>>>>>> >>>>>>>>>>>>>steve >>>>>>>>>>>>> >>>>>>>>>>>>>Alberto Davila wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>Hi Steve, >>>>>>>>>>>>>> >>>>>>>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>poliana- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date. >>>>>>>>>>>>>>>it should instruct you to use the blastSimilarity command. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and >>>>>>>>>>>>>>>query sequences are in GUS, and their def. lines have GUS primary >>>>>>>>>>>>>>>keys. >>>>>>>>>>>>>>>Are your sequences already loaded into GUS? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>They are not, there would be any howto/tips for that plugin ? We >>>>>>>>>>>>>>will >>>>>>>>>>>>>>certainly need a plugin to load "Interpro" and "ORF finding" >>>>>>>>>>>>>>results >>>>>>>>>>>>>>into GUS... If they are not available, then maybe we will have to >>>>>>>>>>>>>>write >>>>>>>>>>>>>>them ... >>>>>>>>>>>>>> >>>>>>>>>>>>>>Cheers, Alberto >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>steve >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>Poliana Mateus wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>Hello all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl?? >>>>>>>>>>>>>>>>I'm trying to run LoadBlastSimFast... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>Poliana >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> |