Re: [Gusdev-gusdev] parseBlastFilesForSimilarity.pl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

hi Alberto -
PlasmoDB project uses a plugin to load the GlimmerM results; it is
GUS::Common::Plugin::ImportPlasmoDBPrediction plugin in the Sanger cvs 
repository. however, please note that this plugin is not generalized, 
and has been used here only for the PlasmoDB project so far.
It would be useful to generalize this plugin some day, so that all can 
benefit.

Bindu

On Feb 11, 2005, at 12:44 PM, Alberto Davila wrote:

> Hey Steve, Thomas,
>
> Thanks a lot for the tips, really helpful.. now, few more questions:
>
>> ok.  NR = NRDB
>>
>> the way we have used gus with similarities is that both the query and
>> subject are loaded into gus.  As thomas explained, the similarity 
>> table
>> captures similarity between sequences that are in gus.
>>
>> our approach has always been to just load (warehouse) the entire 
>> subject
>> database (NR, EST) that we are blasting against.
>>
>> the current plugins and blastSimilarity are set up for this.
>>
>> obviously, this takes a lot of disk space.  two major efficiencies 
>> that
>> we don't currently have plugins for would be:
>>   1. to only store in gus a *reference* to the external sequence (ie,
>> don't store the actgs).
>>   2. only store in gus the sequences that actually have similarities
>
> Option 2 sound better for us, since we will be blasting against several
> databases (> 10GB databases)
>
> What about the plugins to load Interpro and "gene finder" (glimmer, 
> etc)
> results ? Is there any at all ?
>
> Cheers, Alberto
>
>>
>> steve
>>
>> Alberto Davila wrote:
>>
>>> All the blastable databases I mentioned are standard databases from 
>>> NCBI
>>> (ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt):
>>>
>>> NT = nucleotides
>>>
>>> ~30000 entries from genbank (genbank format) are loaded into GUS now.
>>>
>>> Not sure about your "NRDB", I know NR from NCBI that is a collection 
>>> of
>>> aminoacid entries, could it be the same ?
>>>
>>> Alberto
>>>
>>> On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote:
>>>
>>>
>>>> (what is NT?)
>>>>
>>>> which of these (genbank, your fasta, NRDB, NT, EST) have you loaded 
>>>> into
>>>> gus?
>>>>
>>>> steve
>>>>
>>>> Alberto Davila wrote:
>>>>
>>>>
>>>>
>>>>> Query:
>>>>>
>>>>> Either sequences from genbank (genbank format) or sequences 
>>>>> generated in
>>>>> the lab (fasta format)
>>>>>
>>>>> Blastable databases (all are formatted databases from NCBI):
>>>>>
>>>>> NR
>>>>> NT
>>>>> EST
>>>>>
>>>>> Alberto
>>>>>
>>>>> On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> for the blast, what are the query sequences and what are the 
>>>>>> blastable
>>>>>> databases?
>>>>>>
>>>>>> steve
>>>>>>
>>>>>> Alberto Davila wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Basically we will use sequences (loaded into GUS with the 
>>>>>>> GBParser) for
>>>>>>> NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will 
>>>>>>> be also
>>>>>>> used for Interpro analyses. Results of both (Blast and Interpro) 
>>>>>>> will be
>>>>>>> loaded into GUS. We will parse specific things from the Blast 
>>>>>>> results, I
>>>>>>> would say:
>>>>>>>
>>>>>>> `Gi`
>>>>>>> `Accession`
>>>>>>> `Description`
>>>>>>> `E_value`
>>>>>>> `Score`
>>>>>>> `Length`
>>>>>>> `Frame_Query`
>>>>>>> `Frame_Hit`
>>>>>>> `Identical`
>>>>>>> `Hsp_Frac_Identical`
>>>>>>> `Conserved`
>>>>>>> `Hsp_Frac_Conserved`
>>>>>>> `Query_Start`
>>>>>>> `Query_End`
>>>>>>> `Hit_Start`
>>>>>>> `Hit_End`
>>>>>>> `Hsp_Align`
>>>>>>> `database_letters`
>>>>>>> `database_entries`
>>>>>>>
>>>>>>> We already have a Bioperl parser for that (specific for another 
>>>>>>> system:
>>>>>>> GARSA) that could be adapted to GUS, problem being we are not 
>>>>>>> sure what
>>>>>>> tables should be used to store those data in GUS.
>>>>>>>
>>>>>>> Cheers, Alberto
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> what are you planning on blasting?
>>>>>>>>
>>>>>>>> steve
>>>>>>>>
>>>>>>>> Alberto Davila wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi Steve,
>>>>>>>>>
>>>>>>>>> On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> poliana-
>>>>>>>>>>
>>>>>>>>>> oops, the usage statement for LoadBlastSimFast is out of 
>>>>>>>>>> date.   it
>>>>>>>>>> should instruct you to use the blastSimilarity command.
>>>>>>>>>>
>>>>>>>>>> LoadBlastSimFast makes a big assumption, that the subject and 
>>>>>>>>>> query
>>>>>>>>>> sequences are in GUS, and their def. lines have GUS primary 
>>>>>>>>>> keys.
>>>>>>>>>>
>>>>>>>>>> Are your sequences already loaded into GUS?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> They are not, there would be any howto/tips for that plugin ? 
>>>>>>>>> We will
>>>>>>>>> certainly need a plugin to load "Interpro" and "ORF finding" 
>>>>>>>>> results
>>>>>>>>> into GUS... If they are not available, then maybe we will have 
>>>>>>>>> to write
>>>>>>>>> them ...
>>>>>>>>>
>>>>>>>>> Cheers, Alberto
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> steve
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Poliana Mateus wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello all,
>>>>>>>>>>>
>>>>>>>>>>> Where can find the script parseBlastFilesForSimilarity.pl??
>>>>>>>>>>> I'm trying to run LoadBlastSimFast...
>>>>>>>>>>>
>>>>>>>>>>> Poliana
>>>>>>>>>>>
>>>>>>>>>>>
>>>
>>>
>>>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev