From: Alberto D. <da...@io...> - 2005-02-11 17:36:17
|
Hey Steve, Thomas, Thanks a lot for the tips, really helpful.. now, few more questions: > ok. NR = NRDB > > the way we have used gus with similarities is that both the query and > subject are loaded into gus. As thomas explained, the similarity table > captures similarity between sequences that are in gus. > > our approach has always been to just load (warehouse) the entire subject > database (NR, EST) that we are blasting against. > > the current plugins and blastSimilarity are set up for this. > > obviously, this takes a lot of disk space. two major efficiencies that > we don't currently have plugins for would be: > 1. to only store in gus a *reference* to the external sequence (ie, > don't store the actgs). > 2. only store in gus the sequences that actually have similarities Option 2 sound better for us, since we will be blasting against several databases (> 10GB databases) What about the plugins to load Interpro and "gene finder" (glimmer, etc) results ? Is there any at all ? Cheers, Alberto > > steve > > Alberto Davila wrote: > > >All the blastable databases I mentioned are standard databases from NCBI > >(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): > > > >NT = nucleotides > > > >~30000 entries from genbank (genbank format) are loaded into GUS now. > > > >Not sure about your "NRDB", I know NR from NCBI that is a collection of > >aminoacid entries, could it be the same ? > > > >Alberto > > > >On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: > > > > > >>(what is NT?) > >> > >>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into > >>gus? > >> > >>steve > >> > >>Alberto Davila wrote: > >> > >> > >> > >>>Query: > >>> > >>>Either sequences from genbank (genbank format) or sequences generated in > >>>the lab (fasta format) > >>> > >>>Blastable databases (all are formatted databases from NCBI): > >>> > >>>NR > >>>NT > >>>EST > >>> > >>>Alberto > >>> > >>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: > >>> > >>> > >>> > >>> > >>>>for the blast, what are the query sequences and what are the blastable > >>>>databases? > >>>> > >>>>steve > >>>> > >>>>Alberto Davila wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>Basically we will use sequences (loaded into GUS with the GBParser) for > >>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be also > >>>>>used for Interpro analyses. Results of both (Blast and Interpro) will be > >>>>>loaded into GUS. We will parse specific things from the Blast results, I > >>>>>would say: > >>>>> > >>>>>`Gi` > >>>>>`Accession` > >>>>>`Description` > >>>>>`E_value` > >>>>>`Score` > >>>>>`Length` > >>>>>`Frame_Query` > >>>>>`Frame_Hit` > >>>>>`Identical` > >>>>>`Hsp_Frac_Identical` > >>>>>`Conserved` > >>>>>`Hsp_Frac_Conserved` > >>>>>`Query_Start` > >>>>>`Query_End` > >>>>>`Hit_Start` > >>>>>`Hit_End` > >>>>>`Hsp_Align` > >>>>>`database_letters` > >>>>>`database_entries` > >>>>> > >>>>>We already have a Bioperl parser for that (specific for another system: > >>>>>GARSA) that could be adapted to GUS, problem being we are not sure what > >>>>>tables should be used to store those data in GUS. > >>>>> > >>>>>Cheers, Alberto > >>>>> > >>>>> > >>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>what are you planning on blasting? > >>>>>> > >>>>>>steve > >>>>>> > >>>>>>Alberto Davila wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Hi Steve, > >>>>>>> > >>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>poliana- > >>>>>>>> > >>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date. it > >>>>>>>>should instruct you to use the blastSimilarity command. > >>>>>>>> > >>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and query > >>>>>>>>sequences are in GUS, and their def. lines have GUS primary keys. > >>>>>>>> > >>>>>>>>Are your sequences already loaded into GUS? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>They are not, there would be any howto/tips for that plugin ? We will > >>>>>>>certainly need a plugin to load "Interpro" and "ORF finding" results > >>>>>>>into GUS... If they are not available, then maybe we will have to write > >>>>>>>them ... > >>>>>>> > >>>>>>>Cheers, Alberto > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>steve > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>Poliana Mateus wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>Hello all, > >>>>>>>>> > >>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl?? > >>>>>>>>>I'm trying to run LoadBlastSimFast... > >>>>>>>>> > >>>>>>>>>Poliana > >>>>>>>>> > >>>>>>>>> > > > > > > |