From: Alberto D. <da...@io...> - 2005-02-11 18:45:05
|
We are doing this for Garsa (another system) .. basically we have a bioperl parser (Bio::Search::IO) that reads the Blast results file and extract all the needed info (to the "Blast_Hit" table)... and also load into a given table (eg: External_DB) all the sequences (in fasta format) presenting similarity with the queries... at the end we have "Blast_Hit" and "External_DB" populated with the same script. Regarding Interpro and Glimmer, the main problem is to know in which tables we should load the parsed results ? Alberto On Fri, 2005-02-11 at 13:21 -0500, Y. Thomas Gan wrote: > I was going to give the same answer steve gave for interpro and gene > finding results. > > For loading sequences into GUS, the dillema with option 2 is: how do you > know which sequence to load when you load (which is before you actually > have the similarity result)? One solution would be to initially load > complete dataset(s) but delete those without similarity after loading > similarity results. > > -Thomas > > On Fri, 11 Feb 2005, Steve Fischer wrote: > > > alberto- > > > > we've never loaded interpro, so there isn't a plugin. > > i believe plasmodb has loaded glimmer results, though i'm not sure. i have > > asked a plasmodb developer to answer that question. > > > > steve > > > > Alberto Davila wrote: > > > >> Hey Steve, Thomas, > >> > >> Thanks a lot for the tips, really helpful.. now, few more questions: > >> > >> > >>> ok. NR = NRDB > >>> > >>> the way we have used gus with similarities is that both the query and > >>> subject are loaded into gus. As thomas explained, the similarity table > >>> captures similarity between sequences that are in gus. > >>> our approach has always been to just load (warehouse) the entire subject > >>> database (NR, EST) that we are blasting against. > >>> > >>> the current plugins and blastSimilarity are set up for this. > >>> > >>> obviously, this takes a lot of disk space. two major efficiencies that we > >>> don't currently have plugins for would be: > >>> 1. to only store in gus a *reference* to the external sequence (ie, don't > >>> store the actgs). > >>> 2. only store in gus the sequences that actually have similarities > >>> > >> > >> Option 2 sound better for us, since we will be blasting against several > >> databases (> 10GB databases) > >> > >> What about the plugins to load Interpro and "gene finder" (glimmer, etc) > >> results ? Is there any at all ? > >> > >> Cheers, Alberto > >> > >> > >>> steve > >>> > >>> Alberto Davila wrote: > >>> > >>> > >>>> All the blastable databases I mentioned are standard databases from NCBI > >>>> (ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): > >>>> > >>>> NT = nucleotides > >>>> > >>>> ~30000 entries from genbank (genbank format) are loaded into GUS now. > >>>> > >>>> Not sure about your "NRDB", I know NR from NCBI that is a collection of > >>>> aminoacid entries, could it be the same ? > >>>> > >>>> Alberto > >>>> > >>>> On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: > >>>> > >>>> > >>>> > >>>>> (what is NT?) > >>>>> > >>>>> which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into > >>>>> gus? > >>>>> > >>>>> steve > >>>>> > >>>>> Alberto Davila wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Query: > >>>>>> > >>>>>> Either sequences from genbank (genbank format) or sequences generated > >>>>>> in > >>>>>> the lab (fasta format) > >>>>>> > >>>>>> Blastable databases (all are formatted databases from NCBI): > >>>>>> > >>>>>> NR > >>>>>> NT > >>>>>> EST > >>>>>> > >>>>>> Alberto > >>>>>> > >>>>>> On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> for the blast, what are the query sequences and what are the blastable > >>>>>>> databases? > >>>>>>> > >>>>>>> steve > >>>>>>> > >>>>>>> Alberto Davila wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Basically we will use sequences (loaded into GUS with the GBParser) > >>>>>>>> for > >>>>>>>> NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be > >>>>>>>> also > >>>>>>>> used for Interpro analyses. Results of both (Blast and Interpro) will > >>>>>>>> be > >>>>>>>> loaded into GUS. We will parse specific things from the Blast > >>>>>>>> results, I > >>>>>>>> would say: > >>>>>>>> > >>>>>>>> `Gi` `Accession` `Description` `E_value` `Score` `Length` > >>>>>>>> `Frame_Query` `Frame_Hit` `Identical` `Hsp_Frac_Identical` > >>>>>>>> `Conserved` `Hsp_Frac_Conserved` > >>>>>>>> `Query_Start` > >>>>>>>> `Query_End` `Hit_Start` `Hit_End` `Hsp_Align` `database_letters` > >>>>>>>> `database_entries` > >>>>>>>> We already have a Bioperl parser for that (specific for another > >>>>>>>> system: > >>>>>>>> GARSA) that could be adapted to GUS, problem being we are not sure > >>>>>>>> what > >>>>>>>> tables should be used to store those data in GUS. > >>>>>>>> > >>>>>>>> Cheers, Alberto > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> what are you planning on blasting? > >>>>>>>>> > >>>>>>>>> steve > >>>>>>>>> > >>>>>>>>> Alberto Davila wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Hi Steve, > >>>>>>>>>> > >>>>>>>>>> On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> poliana- > >>>>>>>>>>> > >>>>>>>>>>> oops, the usage statement for LoadBlastSimFast is out of date. > >>>>>>>>>>> it should instruct you to use the blastSimilarity command. > >>>>>>>>>>> > >>>>>>>>>>> LoadBlastSimFast makes a big assumption, that the subject and > >>>>>>>>>>> query sequences are in GUS, and their def. lines have GUS primary > >>>>>>>>>>> keys. > >>>>>>>>>>> Are your sequences already loaded into GUS? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> They are not, there would be any howto/tips for that plugin ? We > >>>>>>>>>> will > >>>>>>>>>> certainly need a plugin to load "Interpro" and "ORF finding" > >>>>>>>>>> results > >>>>>>>>>> into GUS... If they are not available, then maybe we will have to > >>>>>>>>>> write > >>>>>>>>>> them ... > >>>>>>>>>> > >>>>>>>>>> Cheers, Alberto > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> steve > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Poliana Mateus wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Hello all, > >>>>>>>>>>>> > >>>>>>>>>>>> Where can find the script parseBlastFilesForSimilarity.pl?? > >>>>>>>>>>>> I'm trying to run LoadBlastSimFast... > >>>>>>>>>>>> > >>>>>>>>>>>> Poliana > >>>>>>>>>>>> > >>>>>>> |