From: Steve F. <sfi...@pc...> - 2004-11-05 16:24:58
|
folks- We should immediately explore a GUS <--> bioperl adaptor. we would use it for: - Genbank and TIGR XML -> GUS - GUS -> GBrowse - possibly GUS -> Chado Here is what Aaron has to say about parsing Genbank, etc: Bio::SeqIO::GenBank is the BioPerl parser for GenBank; it parses and represents all of it (split between Bio::Seq [sequence, id, accession, etc], Bio::SeqFeature [everything found in the feature table] and Bio::Annotation [comments, references, etc] objects). Similar parsers exist for practically all common sequence formats (including TIGR-XML and other genome annotation-relevant formats). Here is what Aaron has to say about GBrowse: IMO, the "best" way to generate (valid) GFF is to use BioPerl's tools for GFF manipulation: Bio::Tools::GFF in older BioPerl's, and Bio::Feature::IO in the latest development release (due out any day now, as soon as I stop reading my email; for now, you can get it from CVS). To use these tools, you build Bio::SeqFeature objects that represent the items you wish to dump as GFF; thus you can build complicated hierarchies of gene models, exons, CDS, UTR, etc, adding deeply structured attributes/annotations to each, and let the BioPerl GFF code figure out how to represent it (in GFF2 or GFF3) so that other tools (including Gbrowse) can read it. |