From: Ed R. <ed_...@be...> - 2004-11-05 17:17:34
|
The first step in writing such an adapter needs to be a document, though, which shows what fields, in what formats go where in GUS. One of the main problems with the parsers is that they have been developed without a common document saying what kind of information goes where. To this end, I have a simple analysis of where our TCruzi and Crypto data are being loaded by the different parsers. I am attaching copies of these two brief documents in MS Word format.Presently this analysis is in Open Office format. I would like to use these to start developing a data-destination document that we can use as a standard for all further parser development. Also, I am not sure that this solution is really necessary for GFF Format. Writing a GFF adapter involves two steps 1. Querying the data and 2. Passing it correctly to BioPerl. The solution we have so far is simply to put the formating information in the SQL query (it's one step). Of course this is a solution that is ignorant of the GUS object model. It would be nice to embedd this process in an object which maps from GUS objects to BioPERL for a number of reasons. But I also think it might be something to put off until later since the formatted SQL query is a quick-and-dirty time saver. -Ed > > From: Steve Fischer <sfi...@pc...> > Date: 2004/11/05 Fri AM 11:25:32 EST > To: gusdev-gusdev <gus...@li...>, > "Aaron J. Mackey" <am...@pc...> > Subject: [Gusdev-gusdev] GUS & bioperl > > folks- > > We should immediately explore a GUS <--> bioperl adaptor. > > we would use it for: > - Genbank and TIGR XML -> GUS > - GUS -> GBrowse > - possibly GUS -> Chado > > Here is what Aaron has to say about parsing Genbank, etc: > > Bio::SeqIO::GenBank is the BioPerl parser for GenBank; it parses and > represents all of it (split between Bio::Seq [sequence, id, accession, > etc], Bio::SeqFeature [everything found in the feature table] and > Bio::Annotation [comments, references, etc] objects). Similar parsers > exist for practically all common sequence formats (including TIGR-XML > and other genome annotation-relevant formats). > > Here is what Aaron has to say about GBrowse: > > IMO, the "best" way to generate (valid) GFF is to use BioPerl's tools > for GFF manipulation: Bio::Tools::GFF in older BioPerl's, and > Bio::Feature::IO in the latest development release (due out any day > now, as soon as I stop reading my email; for now, you can get it from > CVS). > > To use these tools, you build Bio::SeqFeature objects that represent > the items you wish to dump as GFF; thus you can build complicated > hierarchies of gene models, exons, CDS, UTR, etc, adding deeply > structured attributes/annotations to each, and let the BioPerl GFF code > figure out how to represent it (in GFF2 or GFF3) so that other tools > (including Gbrowse) can read it. > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > Ed Robinson 255 Deerfield Rd Bogart, GA 30622 (706)425-9181 Sweet Caroline good times never seemed so good. I've been inclined to believe they never would. --Neil Diamond We're just a bunch of idiots. --Johnny Damon |