gusdev-gusdev Mailing List for Genomic Unified Schema Development (Page 47)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

see below

Alberto Davila wrote:

>We are doing this for Garsa (another system) .. basically we have a
>bioperl parser (Bio::Search::IO) that reads the Blast results file and
>extract all the needed info (to the "Blast_Hit" table)... and also load
>into a given table (eg: External_DB) all the sequences (in fasta format)
>presenting similarity with the queries... at the end we have "Blast_Hit"
>and "External_DB" populated with the same script.
>
>  
>
wow, great.  could you make a gus plugin from that?

>Regarding Interpro and Glimmer, the main problem is to know in which
>tables we should load the parsed results ?
>
>  
>
describe the info you want to store.

steve

>Alberto
>
>On Fri, 2005-02-11 at 13:21 -0500, Y. Thomas Gan wrote:
>  
>
>>I was going to give the same answer steve gave for interpro and gene 
>>finding results.
>>
>>For loading sequences into GUS, the dillema with option 2 is: how do you 
>>know which sequence to load when you load (which is before you actually 
>>have the similarity result)? One solution would be to initially load 
>>complete dataset(s) but delete those without similarity after loading 
>>similarity results.
>>
>>-Thomas
>>
>>On Fri, 11 Feb 2005, Steve Fischer wrote:
>>
>>    
>>
>>>alberto-
>>>
>>>we've never loaded interpro, so there isn't a plugin. 
>>>i believe plasmodb has loaded glimmer results, though i'm not sure.   i have 
>>>asked a plasmodb developer to answer that question.
>>>
>>>steve
>>>
>>>Alberto Davila wrote:
>>>
>>>      
>>>
>>>>Hey Steve, Thomas,
>>>>
>>>>Thanks a lot for the tips, really helpful.. now, few more questions:
>>>>
>>>>
>>>>        
>>>>
>>>>>ok.  NR = NRDB
>>>>>
>>>>>the way we have used gus with similarities is that both the query and 
>>>>>subject are loaded into gus.  As thomas explained, the similarity table 
>>>>>captures similarity between sequences that are in gus. 
>>>>>our approach has always been to just load (warehouse) the entire subject 
>>>>>database (NR, EST) that we are blasting against.
>>>>>
>>>>>the current plugins and blastSimilarity are set up for this.
>>>>>
>>>>>obviously, this takes a lot of disk space.  two major efficiencies that we 
>>>>>don't currently have plugins for would be:
>>>>> 1. to only store in gus a *reference* to the external sequence (ie, don't 
>>>>>store the actgs).
>>>>> 2. only store in gus the sequences that actually have similarities
>>>>>
>>>>>          
>>>>>
>>>>Option 2 sound better for us, since we will be blasting against several
>>>>databases (> 10GB databases)
>>>>
>>>>What about the plugins to load Interpro and "gene finder" (glimmer, etc)
>>>>results ? Is there any at all ?
>>>>
>>>>Cheers, Alberto
>>>>
>>>>
>>>>        
>>>>
>>>>>steve
>>>>>
>>>>>Alberto Davila wrote:
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>All the blastable databases I mentioned are standard databases from NCBI
>>>>>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt):
>>>>>>
>>>>>>NT = nucleotides
>>>>>>
>>>>>>~30000 entries from genbank (genbank format) are loaded into GUS now.
>>>>>>
>>>>>>Not sure about your "NRDB", I know NR from NCBI that is a collection of
>>>>>>aminoacid entries, could it be the same ?
>>>>>>
>>>>>>Alberto
>>>>>>
>>>>>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>(what is NT?)
>>>>>>>
>>>>>>>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into 
>>>>>>>gus?
>>>>>>>
>>>>>>>steve
>>>>>>>
>>>>>>>Alberto Davila wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Query:
>>>>>>>>
>>>>>>>>Either sequences from genbank (genbank format) or sequences generated 
>>>>>>>>in
>>>>>>>>the lab (fasta format)
>>>>>>>>
>>>>>>>>Blastable databases (all are formatted databases from NCBI):
>>>>>>>>
>>>>>>>>NR
>>>>>>>>NT
>>>>>>>>EST
>>>>>>>>
>>>>>>>>Alberto
>>>>>>>>
>>>>>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>for the blast, what are the query sequences and what are the blastable 
>>>>>>>>>databases?
>>>>>>>>>
>>>>>>>>>steve
>>>>>>>>>
>>>>>>>>>Alberto Davila wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>Basically we will use sequences (loaded into GUS with the GBParser) 
>>>>>>>>>>for
>>>>>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be 
>>>>>>>>>>also
>>>>>>>>>>used for Interpro analyses. Results of both (Blast and Interpro) will 
>>>>>>>>>>be
>>>>>>>>>>loaded into GUS. We will parse specific things from the Blast 
>>>>>>>>>>results, I
>>>>>>>>>>would say:
>>>>>>>>>>
>>>>>>>>>>`Gi` `Accession` `Description` `E_value` `Score` `Length` 
>>>>>>>>>>`Frame_Query` `Frame_Hit` `Identical` `Hsp_Frac_Identical` 
>>>>>>>>>>`Conserved` `Hsp_Frac_Conserved`
>>>>>>>>>>`Query_Start`
>>>>>>>>>>`Query_End` `Hit_Start` `Hit_End` `Hsp_Align` `database_letters` 
>>>>>>>>>>`database_entries` 
>>>>>>>>>>We already have a Bioperl parser for that (specific for another 
>>>>>>>>>>system:
>>>>>>>>>>GARSA) that could be adapted to GUS, problem being we are not sure 
>>>>>>>>>>what
>>>>>>>>>>tables should be used to store those data in GUS.
>>>>>>>>>>
>>>>>>>>>>Cheers, Alberto
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>what are you planning on blasting?
>>>>>>>>>>>
>>>>>>>>>>>steve
>>>>>>>>>>>
>>>>>>>>>>>Alberto Davila wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Hi Steve,
>>>>>>>>>>>>
>>>>>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>poliana-
>>>>>>>>>>>>>
>>>>>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date. 
>>>>>>>>>>>>>it should instruct you to use the blastSimilarity command.
>>>>>>>>>>>>>
>>>>>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and 
>>>>>>>>>>>>>query sequences are in GUS, and their def. lines have GUS primary 
>>>>>>>>>>>>>keys. 
>>>>>>>>>>>>>Are your sequences already loaded into GUS?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>They are not, there would be any howto/tips for that plugin ? We 
>>>>>>>>>>>>will
>>>>>>>>>>>>certainly need a plugin to load "Interpro" and "ORF finding" 
>>>>>>>>>>>>results
>>>>>>>>>>>>into GUS... If they are not available, then maybe we will have to 
>>>>>>>>>>>>write
>>>>>>>>>>>>them ...
>>>>>>>>>>>>
>>>>>>>>>>>>Cheers, Alberto
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>steve
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Poliana Mateus wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>>>Hello all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl??
>>>>>>>>>>>>>>I'm trying to run LoadBlastSimFast...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Poliana
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>

2002	Jan	Feb	Mar	Apr	May	Jun (11)	Jul (34)	Aug (14)	Sep (10)	Oct (10)	Nov (11)	Dec (6)
2003	Jan (56)	Feb (76)	Mar (68)	Apr (11)	May (97)	Jun (16)	Jul (29)	Aug (35)	Sep (18)	Oct (32)	Nov (23)	Dec (77)
2004	Jan (52)	Feb (44)	Mar (55)	Apr (38)	May (106)	Jun (82)	Jul (76)	Aug (47)	Sep (36)	Oct (56)	Nov (46)	Dec (61)
2005	Jan (52)	Feb (118)	Mar (41)	Apr (40)	May (35)	Jun (99)	Jul (84)	Aug (104)	Sep (53)	Oct (107)	Nov (68)	Dec (30)
2006	Jan (19)	Feb (27)	Mar (24)	Apr (9)	May (22)	Jun (11)	Jul (34)	Aug (8)	Sep (15)	Oct (55)	Nov (16)	Dec (2)
2007	Jan (12)	Feb (4)	Mar (8)	Apr	May (19)	Jun (3)	Jul (1)	Aug (6)	Sep (12)	Oct (3)	Nov	Dec
2008	Jan (4)	Feb	Mar	Apr	May (1)	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov	Dec (21)
2009	Jan	Feb (2)	Mar (1)	Apr	May (1)	Jun (8)	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb (1)	Mar (4)	Apr (3)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2011	Jan	Feb	Mar	Apr (4)	May (19)	Jun (14)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar (22)	Apr (12)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (2)	Nov	Dec
2015	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug (2)	Sep	Oct	Nov	Dec (1)
2016	Jan (1)	Feb (1)	Mar	Apr (1)	May	Jun (2)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec
2017	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec

gusdev-gusdev Mailing List for Genomic Unified Schema Development (Page 47)

gusdev-gusdev — Topics concerning GUS development