From: Steve F. <sfi...@pc...> - 2005-02-11 18:10:16
|
alberto- we've never loaded interpro, so there isn't a plugin. i believe plasmodb has loaded glimmer results, though i'm not sure. i have asked a plasmodb developer to answer that question. steve Alberto Davila wrote: >Hey Steve, Thomas, > >Thanks a lot for the tips, really helpful.. now, few more questions: > > > >>ok. NR = NRDB >> >>the way we have used gus with similarities is that both the query and >>subject are loaded into gus. As thomas explained, the similarity table >>captures similarity between sequences that are in gus. >> >>our approach has always been to just load (warehouse) the entire subject >>database (NR, EST) that we are blasting against. >> >>the current plugins and blastSimilarity are set up for this. >> >>obviously, this takes a lot of disk space. two major efficiencies that >>we don't currently have plugins for would be: >> 1. to only store in gus a *reference* to the external sequence (ie, >>don't store the actgs). >> 2. only store in gus the sequences that actually have similarities >> >> > >Option 2 sound better for us, since we will be blasting against several >databases (> 10GB databases) > >What about the plugins to load Interpro and "gene finder" (glimmer, etc) >results ? Is there any at all ? > >Cheers, Alberto > > > >>steve >> >>Alberto Davila wrote: >> >> >> >>>All the blastable databases I mentioned are standard databases from NCBI >>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): >>> >>>NT = nucleotides >>> >>>~30000 entries from genbank (genbank format) are loaded into GUS now. >>> >>>Not sure about your "NRDB", I know NR from NCBI that is a collection of >>>aminoacid entries, could it be the same ? >>> >>>Alberto >>> >>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: >>> >>> >>> >>> >>>>(what is NT?) >>>> >>>>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into >>>>gus? >>>> >>>>steve >>>> >>>>Alberto Davila wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Query: >>>>> >>>>>Either sequences from genbank (genbank format) or sequences generated in >>>>>the lab (fasta format) >>>>> >>>>>Blastable databases (all are formatted databases from NCBI): >>>>> >>>>>NR >>>>>NT >>>>>EST >>>>> >>>>>Alberto >>>>> >>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>for the blast, what are the query sequences and what are the blastable >>>>>>databases? >>>>>> >>>>>>steve >>>>>> >>>>>>Alberto Davila wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Basically we will use sequences (loaded into GUS with the GBParser) for >>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be also >>>>>>>used for Interpro analyses. Results of both (Blast and Interpro) will be >>>>>>>loaded into GUS. We will parse specific things from the Blast results, I >>>>>>>would say: >>>>>>> >>>>>>>`Gi` >>>>>>>`Accession` >>>>>>>`Description` >>>>>>>`E_value` >>>>>>>`Score` >>>>>>>`Length` >>>>>>>`Frame_Query` >>>>>>>`Frame_Hit` >>>>>>>`Identical` >>>>>>>`Hsp_Frac_Identical` >>>>>>>`Conserved` >>>>>>>`Hsp_Frac_Conserved` >>>>>>>`Query_Start` >>>>>>>`Query_End` >>>>>>>`Hit_Start` >>>>>>>`Hit_End` >>>>>>>`Hsp_Align` >>>>>>>`database_letters` >>>>>>>`database_entries` >>>>>>> >>>>>>>We already have a Bioperl parser for that (specific for another system: >>>>>>>GARSA) that could be adapted to GUS, problem being we are not sure what >>>>>>>tables should be used to store those data in GUS. >>>>>>> >>>>>>>Cheers, Alberto >>>>>>> >>>>>>> >>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>what are you planning on blasting? >>>>>>>> >>>>>>>>steve >>>>>>>> >>>>>>>>Alberto Davila wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Hi Steve, >>>>>>>>> >>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>poliana- >>>>>>>>>> >>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date. it >>>>>>>>>>should instruct you to use the blastSimilarity command. >>>>>>>>>> >>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and query >>>>>>>>>>sequences are in GUS, and their def. lines have GUS primary keys. >>>>>>>>>> >>>>>>>>>>Are your sequences already loaded into GUS? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>They are not, there would be any howto/tips for that plugin ? We will >>>>>>>>>certainly need a plugin to load "Interpro" and "ORF finding" results >>>>>>>>>into GUS... If they are not available, then maybe we will have to write >>>>>>>>>them ... >>>>>>>>> >>>>>>>>>Cheers, Alberto >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>steve >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>Poliana Mateus wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>Hello all, >>>>>>>>>>> >>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl?? >>>>>>>>>>>I'm trying to run LoadBlastSimFast... >>>>>>>>>>> >>>>>>>>>>>Poliana >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>> >>> >>> >>> |