|
From: Alberto D. <da...@io...> - 2005-02-15 15:46:08
|
Thanks Bindu, We will have a look on it... also, just found a Bioperl module for GlimmerM: http://doc.bioperl.org/bioperl-live/Bio/Tools/Glimmer.html Cheers, Alberto On Mon, 2005-02-14 at 14:45 -0500, Bindu Gajria wrote: > hi Alberto - > PlasmoDB project uses a plugin to load the GlimmerM results; it is > GUS::Common::Plugin::ImportPlasmoDBPrediction plugin in the Sanger cvs > repository. however, please note that this plugin is not generalized, > and has been used here only for the PlasmoDB project so far. > It would be useful to generalize this plugin some day, so that all can > benefit. > > Bindu > > > On Feb 11, 2005, at 12:44 PM, Alberto Davila wrote: > > > Hey Steve, Thomas, > > > > Thanks a lot for the tips, really helpful.. now, few more questions: > > > >> ok. NR = NRDB > >> > >> the way we have used gus with similarities is that both the query and > >> subject are loaded into gus. As thomas explained, the similarity > >> table > >> captures similarity between sequences that are in gus. > >> > >> our approach has always been to just load (warehouse) the entire > >> subject > >> database (NR, EST) that we are blasting against. > >> > >> the current plugins and blastSimilarity are set up for this. > >> > >> obviously, this takes a lot of disk space. two major efficiencies > >> that > >> we don't currently have plugins for would be: > >> 1. to only store in gus a *reference* to the external sequence (ie, > >> don't store the actgs). > >> 2. only store in gus the sequences that actually have similarities > > > > Option 2 sound better for us, since we will be blasting against several > > databases (> 10GB databases) > > > > What about the plugins to load Interpro and "gene finder" (glimmer, > > etc) > > results ? Is there any at all ? > > > > Cheers, Alberto > > > >> > >> steve > >> > >> Alberto Davila wrote: > >> > >>> All the blastable databases I mentioned are standard databases from > >>> NCBI > >>> (ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): > >>> > >>> NT = nucleotides > >>> > >>> ~30000 entries from genbank (genbank format) are loaded into GUS now. > >>> > >>> Not sure about your "NRDB", I know NR from NCBI that is a collection > >>> of > >>> aminoacid entries, could it be the same ? > >>> > >>> Alberto > >>> > >>> On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: > >>> > >>> > >>>> (what is NT?) > >>>> > >>>> which of these (genbank, your fasta, NRDB, NT, EST) have you loaded > >>>> into > >>>> gus? > >>>> > >>>> steve > >>>> > >>>> Alberto Davila wrote: > >>>> > >>>> > >>>> > >>>>> Query: > >>>>> > >>>>> Either sequences from genbank (genbank format) or sequences > >>>>> generated in > >>>>> the lab (fasta format) > >>>>> > >>>>> Blastable databases (all are formatted databases from NCBI): > >>>>> > >>>>> NR > >>>>> NT > >>>>> EST > >>>>> > >>>>> Alberto > >>>>> > >>>>> On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> for the blast, what are the query sequences and what are the > >>>>>> blastable > >>>>>> databases? > >>>>>> > >>>>>> steve > >>>>>> > >>>>>> Alberto Davila wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Basically we will use sequences (loaded into GUS with the > >>>>>>> GBParser) for > >>>>>>> NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will > >>>>>>> be also > >>>>>>> used for Interpro analyses. Results of both (Blast and Interpro) > >>>>>>> will be > >>>>>>> loaded into GUS. We will parse specific things from the Blast > >>>>>>> results, I > >>>>>>> would say: > >>>>>>> > >>>>>>> `Gi` > >>>>>>> `Accession` > >>>>>>> `Description` > >>>>>>> `E_value` > >>>>>>> `Score` > >>>>>>> `Length` > >>>>>>> `Frame_Query` > >>>>>>> `Frame_Hit` > >>>>>>> `Identical` > >>>>>>> `Hsp_Frac_Identical` > >>>>>>> `Conserved` > >>>>>>> `Hsp_Frac_Conserved` > >>>>>>> `Query_Start` > >>>>>>> `Query_End` > >>>>>>> `Hit_Start` > >>>>>>> `Hit_End` > >>>>>>> `Hsp_Align` > >>>>>>> `database_letters` > >>>>>>> `database_entries` > >>>>>>> > >>>>>>> We already have a Bioperl parser for that (specific for another > >>>>>>> system: > >>>>>>> GARSA) that could be adapted to GUS, problem being we are not > >>>>>>> sure what > >>>>>>> tables should be used to store those data in GUS. > >>>>>>> > >>>>>>> Cheers, Alberto > >>>>>>> > >>>>>>> > >>>>>>> On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> what are you planning on blasting? > >>>>>>>> > >>>>>>>> steve > >>>>>>>> > >>>>>>>> Alberto Davila wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Hi Steve, > >>>>>>>>> > >>>>>>>>> On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> poliana- > >>>>>>>>>> > >>>>>>>>>> oops, the usage statement for LoadBlastSimFast is out of > >>>>>>>>>> date. it > >>>>>>>>>> should instruct you to use the blastSimilarity command. > >>>>>>>>>> > >>>>>>>>>> LoadBlastSimFast makes a big assumption, that the subject and > >>>>>>>>>> query > >>>>>>>>>> sequences are in GUS, and their def. lines have GUS primary > >>>>>>>>>> keys. > >>>>>>>>>> > >>>>>>>>>> Are your sequences already loaded into GUS? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> They are not, there would be any howto/tips for that plugin ? > >>>>>>>>> We will > >>>>>>>>> certainly need a plugin to load "Interpro" and "ORF finding" > >>>>>>>>> results > >>>>>>>>> into GUS... If they are not available, then maybe we will have > >>>>>>>>> to write > >>>>>>>>> them ... > >>>>>>>>> > >>>>>>>>> Cheers, Alberto > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> steve > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Poliana Mateus wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Hello all, > >>>>>>>>>>> > >>>>>>>>>>> Where can find the script parseBlastFilesForSimilarity.pl?? > >>>>>>>>>>> I'm trying to run LoadBlastSimFast... > >>>>>>>>>>> > >>>>>>>>>>> Poliana |