Re: [Gusdev-gusdev] parseBlastFilesForSimilarity.pl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

alberto-

we've never loaded interpro, so there isn't a plugin. 

i believe plasmodb has loaded glimmer results, though i'm not sure.   i 
have asked a plasmodb developer to answer that question.

steve

Alberto Davila wrote:

>Hey Steve, Thomas,
>
>Thanks a lot for the tips, really helpful.. now, few more questions:
>
>  
>
>>ok.  NR = NRDB
>>
>>the way we have used gus with similarities is that both the query and 
>>subject are loaded into gus.  As thomas explained, the similarity table 
>>captures similarity between sequences that are in gus. 
>>
>>our approach has always been to just load (warehouse) the entire subject 
>>database (NR, EST) that we are blasting against.
>>
>>the current plugins and blastSimilarity are set up for this.
>>
>>obviously, this takes a lot of disk space.  two major efficiencies that 
>>we don't currently have plugins for would be:
>>  1. to only store in gus a *reference* to the external sequence (ie, 
>>don't store the actgs).
>>  2. only store in gus the sequences that actually have similarities
>>    
>>
>
>Option 2 sound better for us, since we will be blasting against several
>databases (> 10GB databases)
>
>What about the plugins to load Interpro and "gene finder" (glimmer, etc)
>results ? Is there any at all ?
>
>Cheers, Alberto
>
>  
>
>>steve
>>
>>Alberto Davila wrote:
>>
>>    
>>
>>>All the blastable databases I mentioned are standard databases from NCBI
>>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt):
>>>
>>>NT = nucleotides
>>>
>>>~30000 entries from genbank (genbank format) are loaded into GUS now.
>>>
>>>Not sure about your "NRDB", I know NR from NCBI that is a collection of
>>>aminoacid entries, could it be the same ?
>>>
>>>Alberto
>>>
>>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote:
>>> 
>>>
>>>      
>>>
>>>>(what is NT?)
>>>>
>>>>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into 
>>>>gus?
>>>>
>>>>steve
>>>>
>>>>Alberto Davila wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Query:
>>>>>
>>>>>Either sequences from genbank (genbank format) or sequences generated in
>>>>>the lab (fasta format)
>>>>>
>>>>>Blastable databases (all are formatted databases from NCBI):
>>>>>
>>>>>NR
>>>>>NT
>>>>>EST
>>>>>
>>>>>Alberto
>>>>>
>>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote:
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>for the blast, what are the query sequences and what are the blastable 
>>>>>>databases?
>>>>>>
>>>>>>steve
>>>>>>
>>>>>>Alberto Davila wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Basically we will use sequences (loaded into GUS with the GBParser) for
>>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be also
>>>>>>>used for Interpro analyses. Results of both (Blast and Interpro) will be
>>>>>>>loaded into GUS. We will parse specific things from the Blast results, I
>>>>>>>would say:
>>>>>>>
>>>>>>>`Gi` 
>>>>>>>`Accession` 
>>>>>>>`Description` 
>>>>>>>`E_value` 
>>>>>>>`Score` 
>>>>>>>`Length` 
>>>>>>>`Frame_Query` 
>>>>>>>`Frame_Hit` 
>>>>>>>`Identical` 
>>>>>>>`Hsp_Frac_Identical` 
>>>>>>>`Conserved` 
>>>>>>>`Hsp_Frac_Conserved`
>>>>>>>`Query_Start`
>>>>>>>`Query_End` 
>>>>>>>`Hit_Start` 
>>>>>>>`Hit_End` 
>>>>>>>`Hsp_Align` 
>>>>>>>`database_letters` 
>>>>>>>`database_entries` 
>>>>>>>
>>>>>>>We already have a Bioperl parser for that (specific for another system:
>>>>>>>GARSA) that could be adapted to GUS, problem being we are not sure what
>>>>>>>tables should be used to store those data in GUS.
>>>>>>>
>>>>>>>Cheers, Alberto
>>>>>>>
>>>>>>>
>>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote:
>>>>>>>
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>what are you planning on blasting?
>>>>>>>>
>>>>>>>>steve
>>>>>>>>
>>>>>>>>Alberto Davila wrote:
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>Hi Steve,
>>>>>>>>>
>>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   
>>>>>>>>>
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>poliana-
>>>>>>>>>>
>>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date.   it 
>>>>>>>>>>should instruct you to use the blastSimilarity command.
>>>>>>>>>>
>>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and query 
>>>>>>>>>>sequences are in GUS, and their def. lines have GUS primary keys. 
>>>>>>>>>>
>>>>>>>>>>Are your sequences already loaded into GUS?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     
>>>>>>>>>>
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>They are not, there would be any howto/tips for that plugin ? We will
>>>>>>>>>certainly need a plugin to load "Interpro" and "ORF finding" results
>>>>>>>>>into GUS... If they are not available, then maybe we will have to write
>>>>>>>>>them ...
>>>>>>>>>
>>>>>>>>>Cheers, Alberto
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   
>>>>>>>>>
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>steve
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Poliana Mateus wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     
>>>>>>>>>>
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>Hello all,
>>>>>>>>>>>
>>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl??
>>>>>>>>>>>I'm trying to run LoadBlastSimFast...
>>>>>>>>>>>
>>>>>>>>>>>Poliana
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>> 
>>>
>>>      
>>>