From: Steve F. <sfi...@pc...> - 2005-02-11 18:48:37
|
see below Alberto Davila wrote: >We are doing this for Garsa (another system) .. basically we have a >bioperl parser (Bio::Search::IO) that reads the Blast results file and >extract all the needed info (to the "Blast_Hit" table)... and also load >into a given table (eg: External_DB) all the sequences (in fasta format) >presenting similarity with the queries... at the end we have "Blast_Hit" >and "External_DB" populated with the same script. > > > wow, great. could you make a gus plugin from that? >Regarding Interpro and Glimmer, the main problem is to know in which >tables we should load the parsed results ? > > > describe the info you want to store. steve >Alberto > >On Fri, 2005-02-11 at 13:21 -0500, Y. Thomas Gan wrote: > > >>I was going to give the same answer steve gave for interpro and gene >>finding results. >> >>For loading sequences into GUS, the dillema with option 2 is: how do you >>know which sequence to load when you load (which is before you actually >>have the similarity result)? One solution would be to initially load >>complete dataset(s) but delete those without similarity after loading >>similarity results. >> >>-Thomas >> >>On Fri, 11 Feb 2005, Steve Fischer wrote: >> >> >> >>>alberto- >>> >>>we've never loaded interpro, so there isn't a plugin. >>>i believe plasmodb has loaded glimmer results, though i'm not sure. i have >>>asked a plasmodb developer to answer that question. >>> >>>steve >>> >>>Alberto Davila wrote: >>> >>> >>> >>>>Hey Steve, Thomas, >>>> >>>>Thanks a lot for the tips, really helpful.. now, few more questions: >>>> >>>> >>>> >>>> >>>>>ok. NR = NRDB >>>>> >>>>>the way we have used gus with similarities is that both the query and >>>>>subject are loaded into gus. As thomas explained, the similarity table >>>>>captures similarity between sequences that are in gus. >>>>>our approach has always been to just load (warehouse) the entire subject >>>>>database (NR, EST) that we are blasting against. >>>>> >>>>>the current plugins and blastSimilarity are set up for this. >>>>> >>>>>obviously, this takes a lot of disk space. two major efficiencies that we >>>>>don't currently have plugins for would be: >>>>> 1. to only store in gus a *reference* to the external sequence (ie, don't >>>>>store the actgs). >>>>> 2. only store in gus the sequences that actually have similarities >>>>> >>>>> >>>>> >>>>Option 2 sound better for us, since we will be blasting against several >>>>databases (> 10GB databases) >>>> >>>>What about the plugins to load Interpro and "gene finder" (glimmer, etc) >>>>results ? Is there any at all ? >>>> >>>>Cheers, Alberto >>>> >>>> >>>> >>>> >>>>>steve >>>>> >>>>>Alberto Davila wrote: >>>>> >>>>> >>>>> >>>>> >>>>>>All the blastable databases I mentioned are standard databases from NCBI >>>>>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt): >>>>>> >>>>>>NT = nucleotides >>>>>> >>>>>>~30000 entries from genbank (genbank format) are loaded into GUS now. >>>>>> >>>>>>Not sure about your "NRDB", I know NR from NCBI that is a collection of >>>>>>aminoacid entries, could it be the same ? >>>>>> >>>>>>Alberto >>>>>> >>>>>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>(what is NT?) >>>>>>> >>>>>>>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into >>>>>>>gus? >>>>>>> >>>>>>>steve >>>>>>> >>>>>>>Alberto Davila wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Query: >>>>>>>> >>>>>>>>Either sequences from genbank (genbank format) or sequences generated >>>>>>>>in >>>>>>>>the lab (fasta format) >>>>>>>> >>>>>>>>Blastable databases (all are formatted databases from NCBI): >>>>>>>> >>>>>>>>NR >>>>>>>>NT >>>>>>>>EST >>>>>>>> >>>>>>>>Alberto >>>>>>>> >>>>>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>for the blast, what are the query sequences and what are the blastable >>>>>>>>>databases? >>>>>>>>> >>>>>>>>>steve >>>>>>>>> >>>>>>>>>Alberto Davila wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>Basically we will use sequences (loaded into GUS with the GBParser) >>>>>>>>>>for >>>>>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be >>>>>>>>>>also >>>>>>>>>>used for Interpro analyses. Results of both (Blast and Interpro) will >>>>>>>>>>be >>>>>>>>>>loaded into GUS. We will parse specific things from the Blast >>>>>>>>>>results, I >>>>>>>>>>would say: >>>>>>>>>> >>>>>>>>>>`Gi` `Accession` `Description` `E_value` `Score` `Length` >>>>>>>>>>`Frame_Query` `Frame_Hit` `Identical` `Hsp_Frac_Identical` >>>>>>>>>>`Conserved` `Hsp_Frac_Conserved` >>>>>>>>>>`Query_Start` >>>>>>>>>>`Query_End` `Hit_Start` `Hit_End` `Hsp_Align` `database_letters` >>>>>>>>>>`database_entries` >>>>>>>>>>We already have a Bioperl parser for that (specific for another >>>>>>>>>>system: >>>>>>>>>>GARSA) that could be adapted to GUS, problem being we are not sure >>>>>>>>>>what >>>>>>>>>>tables should be used to store those data in GUS. >>>>>>>>>> >>>>>>>>>>Cheers, Alberto >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>what are you planning on blasting? >>>>>>>>>>> >>>>>>>>>>>steve >>>>>>>>>>> >>>>>>>>>>>Alberto Davila wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>Hi Steve, >>>>>>>>>>>> >>>>>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>poliana- >>>>>>>>>>>>> >>>>>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date. >>>>>>>>>>>>>it should instruct you to use the blastSimilarity command. >>>>>>>>>>>>> >>>>>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and >>>>>>>>>>>>>query sequences are in GUS, and their def. lines have GUS primary >>>>>>>>>>>>>keys. >>>>>>>>>>>>>Are your sequences already loaded into GUS? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>They are not, there would be any howto/tips for that plugin ? We >>>>>>>>>>>>will >>>>>>>>>>>>certainly need a plugin to load "Interpro" and "ORF finding" >>>>>>>>>>>>results >>>>>>>>>>>>into GUS... If they are not available, then maybe we will have to >>>>>>>>>>>>write >>>>>>>>>>>>them ... >>>>>>>>>>>> >>>>>>>>>>>>Cheers, Alberto >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>steve >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>Poliana Mateus wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>Hello all, >>>>>>>>>>>>>> >>>>>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl?? >>>>>>>>>>>>>>I'm trying to run LoadBlastSimFast... >>>>>>>>>>>>>> >>>>>>>>>>>>>>Poliana >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> |