Re: [Gusdev-gusdev] parseBlastFilesForSimilarity.pl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

All the blastable databases I mentioned are standard databases from NCBI
(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt):

NT = nucleotides

~30000 entries from genbank (genbank format) are loaded into GUS now.

Not sure about your "NRDB", I know NR from NCBI that is a collection of
aminoacid entries, could it be the same ?

Alberto

On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote:
> (what is NT?)
> 
> which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into 
> gus?
> 
> steve
> 
> Alberto Davila wrote:
> 
> >Query:
> >
> >Either sequences from genbank (genbank format) or sequences generated in
> >the lab (fasta format)
> >
> >Blastable databases (all are formatted databases from NCBI):
> >
> >NR
> >NT
> >EST
> >
> >Alberto
> >
> >On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote:
> >  
> >
> >>for the blast, what are the query sequences and what are the blastable 
> >>databases?
> >>
> >>steve
> >>
> >>Alberto Davila wrote:
> >>
> >>    
> >>
> >>>Basically we will use sequences (loaded into GUS with the GBParser) for
> >>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be also
> >>>used for Interpro analyses. Results of both (Blast and Interpro) will be
> >>>loaded into GUS. We will parse specific things from the Blast results, I
> >>>would say:
> >>>
> >>> `Gi` 
> >>> `Accession` 
> >>> `Description` 
> >>> `E_value` 
> >>> `Score` 
> >>> `Length` 
> >>> `Frame_Query` 
> >>> `Frame_Hit` 
> >>> `Identical` 
> >>> `Hsp_Frac_Identical` 
> >>> `Conserved` 
> >>> `Hsp_Frac_Conserved`
> >>> `Query_Start`
> >>> `Query_End` 
> >>> `Hit_Start` 
> >>> `Hit_End` 
> >>> `Hsp_Align` 
> >>> `database_letters` 
> >>> `database_entries` 
> >>>
> >>>We already have a Bioperl parser for that (specific for another system:
> >>>GARSA) that could be adapted to GUS, problem being we are not sure what
> >>>tables should be used to store those data in GUS.
> >>>
> >>>Cheers, Alberto
> >>>
> >>>
> >>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote:
> >>> 
> >>>
> >>>      
> >>>
> >>>>what are you planning on blasting?
> >>>>
> >>>>steve
> >>>>
> >>>>Alberto Davila wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>Hi Steve,
> >>>>>
> >>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote:
> >>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>poliana-
> >>>>>>
> >>>>>>oops, the usage statement for LoadBlastSimFast is out of date.   it 
> >>>>>>should instruct you to use the blastSimilarity command.
> >>>>>>
> >>>>>>LoadBlastSimFast makes a big assumption, that the subject and query 
> >>>>>>sequences are in GUS, and their def. lines have GUS primary keys. 
> >>>>>>
> >>>>>>Are your sequences already loaded into GUS?
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>They are not, there would be any howto/tips for that plugin ? We will
> >>>>>certainly need a plugin to load "Interpro" and "ORF finding" results
> >>>>>into GUS... If they are not available, then maybe we will have to write
> >>>>>them ...
> >>>>>
> >>>>>Cheers, Alberto
> >>>>>
> >>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>steve
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Poliana Mateus wrote:
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>Hello all,
> >>>>>>>
> >>>>>>>Where can find the script parseBlastFilesForSimilarity.pl??
> >>>>>>>I'm trying to run LoadBlastSimFast...
> >>>>>>>
> >>>>>>>Poliana