From: Paul M. <pj...@sa...> - 2004-11-05 20:46:43
|
On 5 Nov 2004, at 17:40, Steve Fischer wrote: > about gbrowse.=A0=A0 it is good that Haiming has put together a = prototype=20 > for loading gus data into gbrowse. > > but, as Aaron points out, we will likely be putting sophisticated=20 > data into gbrowse.=A0=A0 i would rather start on a strong foundation = than=20 > invest resources into a solution that we will grow out of.=A0=A0 > > it should not be hard to put gus data into bioperl. > We would welcome GUS to Bioperl software :) Ed has a point: mapping GUS objects to bioperl objects and back again=20 needs some thought. We put Bioperl data into GUS and we've=20 retrospectively documented what goes where so all developers understand=20= how things work. It also highlights anything that is "missing". I hope GFF output has improved in the latest CVS version of Bioperl,=20 the stable 1.4 version was not up to scratch for me so I just wrote my=20= own :( > steve > > Ed Robinson wrote: > > The first step in writing such an adapter needs to be a document,=20 > though, which shows what fields, in what formats go where in GUS. One=20= > of the main problems with the parsers is that they have been developed=20= > without a common document saying what kind of information goes where. > > To this end, I have a simple analysis of where our TCruzi and Crypto=20= > data are being loaded by the different parsers. I am attaching copies=20= > of these two brief documents in MS Word format.Presently this analysis=20= > is in Open Office format. I would like to use these to start=20 > developing a data-destination document that we can use as a standard=20= > for all further parser development. > > Also, I am not sure that this solution is really necessary for GFF=20 > Format. Writing a GFF adapter involves two steps 1. Querying the data=20= > and 2. Passing it correctly to BioPerl. The solution we have so far=20= > is simply to put the formating information in the SQL query (it's one=20= > step). Of course this is a solution that is ignorant of the GUS=20 > object model. It would be nice to embedd this process in an object=20 > which maps from GUS objects to BioPERL for a number of reasons. But I=20= > also think it might be something to put off until later since the=20 > formatted SQL query is a quick-and-dirty time saver. > > -Ed > > > > > > From: Steve Fischer <sfi...@pc...> > Date: 2004/11/05 Fri AM 11:25:32 EST > To: gusdev-gusdev <gus...@li...>, > "Aaron J. Mackey" <am...@pc...> > Subject: [Gusdev-gusdev] GUS & bioperl > > folks- > > We should immediately explore a GUS <--> bioperl adaptor. > > we would use it for: > - Genbank and TIGR XML -> GUS > - GUS -> GBrowse > - possibly GUS -> Chado > > Here is what Aaron has to say about parsing Genbank, etc: > > Bio::SeqIO::GenBank is the BioPerl parser for GenBank; it parses and > represents all of it (split between Bio::Seq [sequence, id, accession, > etc], Bio::SeqFeature [everything found in the feature table] and > Bio::Annotation [comments, references, etc] objects). Similar parsers > exist for practically all common sequence formats (including TIGR-XML > and other genome annotation-relevant formats). > > Here is what Aaron has to say about GBrowse: > > IMO, the "best" way to generate (valid) GFF is to use BioPerl's tools > for GFF manipulation: Bio::Tools::GFF in older BioPerl's, and > Bio::Feature::IO in the latest development release (due out any day > now, as soon as I stop reading my email; for now, you can get it from > CVS). > > To use these tools, you build Bio::SeqFeature objects that represent > the items you wish to dump as GFF; thus you can build complicated > hierarchies of gene models, exons, CDS, UTR, etc, adding deeply > structured attributes/annotations to each, and let the BioPerl GFF = code > figure out how to represent it (in GFF2 or GFF3) so that other tools > (including Gbrowse) can read it. > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=3D5588&alloc_id=3D12065&op=3Dclick > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > Ed Robinson > 255 Deerfield Rd > Bogart, GA 30622 > (706)425-9181 > Sweet Caroline > > good times never seemed so good. > I've been inclined > to believe they never would. > --Neil Diamond > > > We're just a bunch of idiots. > --Johnny Damon |