[Gusdev-gusdev] GUS & bioperl

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

folks-

We should immediately explore a GUS <--> bioperl adaptor.

we would use it for:
   - Genbank and TIGR XML -> GUS
   - GUS -> GBrowse
   - possibly GUS -> Chado

Here is what Aaron has to say about parsing Genbank, etc:

Bio::SeqIO::GenBank is the BioPerl parser for GenBank; it parses and  
represents all of it (split between Bio::Seq [sequence, id, accession,  
etc], Bio::SeqFeature [everything found in the feature table] and  
Bio::Annotation [comments, references, etc] objects).  Similar parsers  
exist for practically all common sequence formats (including TIGR-XML  
and other genome annotation-relevant formats).

Here is what Aaron has to say about GBrowse:

IMO, the "best" way to generate (valid) GFF is to use BioPerl's tools  
for GFF manipulation: Bio::Tools::GFF in older BioPerl's, and  
Bio::Feature::IO in the latest development release (due out any day  
now, as soon as I stop reading my email; for now, you can get it from  
CVS).

To use these tools, you build Bio::SeqFeature objects that represent  
the items you wish to dump as GFF; thus you can build complicated  
hierarchies of gene models, exons, CDS, UTR, etc, adding deeply  
structured attributes/annotations to each, and let the BioPerl GFF code  
figure out how to represent it (in GFF2 or GFF3) so that other tools  
(including Gbrowse) can read it.