|
From: Steve F. <sfi...@pc...> - 2005-02-14 20:53:37
|
Poliana-
the only blast plugins we have are LoadBlastSimFast and
LoadBlastSimilarityPK.
the only tables are Similarity and SimilaritySpan
steve
Poliana Mateus wrote:
>Hi Steve
>
>I need to insert given in the GUS (resulted blast) as:
>
>----------------------------------------------------
>extracted data of ours script
>----------------------------------------------------
>query_name
>name
>accession
>description
>significance
>raw_score
>length
>num_identical
>frac_identical
>num_conserved
>frac_conserved
>start('query')
>end('query')
>start('hit')
>end('hit')
>----------------------------------------------------
>
>Analyzing the LoadBlastSimFast Plugin I verified that it inserts in
>tables DoTs.Similarity and DoTs.SymilaritySpan, both only accept given
>numerics.
>Exists into GUS other tables that store resulted of Blast?
>
>Poliana
>
>
>
>
>
>
>On Fri, 11 Feb 2005 13:50:32 -0500, Steve Fischer
><sfi...@pc...> wrote:
>
>
>>see below
>>
>>Alberto Davila wrote:
>>
>>
>>
>>>We are doing this for Garsa (another system) .. basically we have a
>>>bioperl parser (Bio::Search::IO) that reads the Blast results file and
>>>extract all the needed info (to the "Blast_Hit" table)... and also load
>>>into a given table (eg: External_DB) all the sequences (in fasta format)
>>>presenting similarity with the queries... at the end we have "Blast_Hit"
>>>and "External_DB" populated with the same script.
>>>
>>>
>>>
>>>
>>>
>>wow, great. could you make a gus plugin from that?
>>
>>
>>
>>>Regarding Interpro and Glimmer, the main problem is to know in which
>>>tables we should load the parsed results ?
>>>
>>>
>>>
>>>
>>>
>>describe the info you want to store.
>>
>>steve
>>
>>
>>
>>>Alberto
>>>
>>>On Fri, 2005-02-11 at 13:21 -0500, Y. Thomas Gan wrote:
>>>
>>>
>>>
>>>
>>>>I was going to give the same answer steve gave for interpro and gene
>>>>finding results.
>>>>
>>>>For loading sequences into GUS, the dillema with option 2 is: how do you
>>>>know which sequence to load when you load (which is before you actually
>>>>have the similarity result)? One solution would be to initially load
>>>>complete dataset(s) but delete those without similarity after loading
>>>>similarity results.
>>>>
>>>>-Thomas
>>>>
>>>>On Fri, 11 Feb 2005, Steve Fischer wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>alberto-
>>>>>
>>>>>we've never loaded interpro, so there isn't a plugin.
>>>>>i believe plasmodb has loaded glimmer results, though i'm not sure. i have
>>>>>asked a plasmodb developer to answer that question.
>>>>>
>>>>>steve
>>>>>
>>>>>Alberto Davila wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hey Steve, Thomas,
>>>>>>
>>>>>>Thanks a lot for the tips, really helpful.. now, few more questions:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>ok. NR = NRDB
>>>>>>>
>>>>>>>the way we have used gus with similarities is that both the query and
>>>>>>>subject are loaded into gus. As thomas explained, the similarity table
>>>>>>>captures similarity between sequences that are in gus.
>>>>>>>our approach has always been to just load (warehouse) the entire subject
>>>>>>>database (NR, EST) that we are blasting against.
>>>>>>>
>>>>>>>the current plugins and blastSimilarity are set up for this.
>>>>>>>
>>>>>>>obviously, this takes a lot of disk space. two major efficiencies that we
>>>>>>>don't currently have plugins for would be:
>>>>>>>1. to only store in gus a *reference* to the external sequence (ie, don't
>>>>>>>store the actgs).
>>>>>>>2. only store in gus the sequences that actually have similarities
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>Option 2 sound better for us, since we will be blasting against several
>>>>>>databases (> 10GB databases)
>>>>>>
>>>>>>What about the plugins to load Interpro and "gene finder" (glimmer, etc)
>>>>>>results ? Is there any at all ?
>>>>>>
>>>>>>Cheers, Alberto
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>steve
>>>>>>>
>>>>>>>Alberto Davila wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>All the blastable databases I mentioned are standard databases from NCBI
>>>>>>>>(ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.txt):
>>>>>>>>
>>>>>>>>NT = nucleotides
>>>>>>>>
>>>>>>>>~30000 entries from genbank (genbank format) are loaded into GUS now.
>>>>>>>>
>>>>>>>>Not sure about your "NRDB", I know NR from NCBI that is a collection of
>>>>>>>>aminoacid entries, could it be the same ?
>>>>>>>>
>>>>>>>>Alberto
>>>>>>>>
>>>>>>>>On Fri, 2005-02-11 at 10:43 -0500, Steve Fischer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>(what is NT?)
>>>>>>>>>
>>>>>>>>>which of these (genbank, your fasta, NRDB, NT, EST) have you loaded into
>>>>>>>>>gus?
>>>>>>>>>
>>>>>>>>>steve
>>>>>>>>>
>>>>>>>>>Alberto Davila wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>Query:
>>>>>>>>>>
>>>>>>>>>>Either sequences from genbank (genbank format) or sequences generated
>>>>>>>>>>in
>>>>>>>>>>the lab (fasta format)
>>>>>>>>>>
>>>>>>>>>>Blastable databases (all are formatted databases from NCBI):
>>>>>>>>>>
>>>>>>>>>>NR
>>>>>>>>>>NT
>>>>>>>>>>EST
>>>>>>>>>>
>>>>>>>>>>Alberto
>>>>>>>>>>
>>>>>>>>>>On Fri, 2005-02-11 at 10:34 -0500, Steve Fischer wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>for the blast, what are the query sequences and what are the blastable
>>>>>>>>>>>databases?
>>>>>>>>>>>
>>>>>>>>>>>steve
>>>>>>>>>>>
>>>>>>>>>>>Alberto Davila wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>Basically we will use sequences (loaded into GUS with the GBParser)
>>>>>>>>>>>>for
>>>>>>>>>>>>NCBI Blast (Blastx, Blastp and TBlastX), the same sequences will be
>>>>>>>>>>>>also
>>>>>>>>>>>>used for Interpro analyses. Results of both (Blast and Interpro) will
>>>>>>>>>>>>be
>>>>>>>>>>>>loaded into GUS. We will parse specific things from the Blast
>>>>>>>>>>>>results, I
>>>>>>>>>>>>would say:
>>>>>>>>>>>>
>>>>>>>>>>>>`Gi` `Accession` `Description` `E_value` `Score` `Length`
>>>>>>>>>>>>`Frame_Query` `Frame_Hit` `Identical` `Hsp_Frac_Identical`
>>>>>>>>>>>>`Conserved` `Hsp_Frac_Conserved`
>>>>>>>>>>>>`Query_Start`
>>>>>>>>>>>>`Query_End` `Hit_Start` `Hit_End` `Hsp_Align` `database_letters`
>>>>>>>>>>>>`database_entries`
>>>>>>>>>>>>We already have a Bioperl parser for that (specific for another
>>>>>>>>>>>>system:
>>>>>>>>>>>>GARSA) that could be adapted to GUS, problem being we are not sure
>>>>>>>>>>>>what
>>>>>>>>>>>>tables should be used to store those data in GUS.
>>>>>>>>>>>>
>>>>>>>>>>>>Cheers, Alberto
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>On Fri, 2005-02-11 at 10:06 -0500, Steve Fischer wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>what are you planning on blasting?
>>>>>>>>>>>>>
>>>>>>>>>>>>>steve
>>>>>>>>>>>>>
>>>>>>>>>>>>>Alberto Davila wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>Hi Steve,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>On Fri, 2005-02-11 at 08:56 -0500, Steve Fischer wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>poliana-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>oops, the usage statement for LoadBlastSimFast is out of date.
>>>>>>>>>>>>>>>it should instruct you to use the blastSimilarity command.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>LoadBlastSimFast makes a big assumption, that the subject and
>>>>>>>>>>>>>>>query sequences are in GUS, and their def. lines have GUS primary
>>>>>>>>>>>>>>>keys.
>>>>>>>>>>>>>>>Are your sequences already loaded into GUS?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>They are not, there would be any howto/tips for that plugin ? We
>>>>>>>>>>>>>>will
>>>>>>>>>>>>>>certainly need a plugin to load "Interpro" and "ORF finding"
>>>>>>>>>>>>>>results
>>>>>>>>>>>>>>into GUS... If they are not available, then maybe we will have to
>>>>>>>>>>>>>>write
>>>>>>>>>>>>>>them ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Cheers, Alberto
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>steve
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Poliana Mateus wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Hello all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Where can find the script parseBlastFilesForSimilarity.pl??
>>>>>>>>>>>>>>>>I'm trying to run LoadBlastSimFast...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Poliana
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
|