Re: Re: [Gusdev-gusdev] GUS & bioperl

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

>>>We put Bioperl data into GUS and we've 
> retrospectively documented what goes where so all developers understand 
> how things work. It also highlights anything that is "missing".

Can you post a copy of any documents you have to the list!?!?

This is the main problem we are having right now, there isn't an agreed upon, documented mapping of what goes where in GUS.  Your document would be great for starting a discussion to create such a standard document. 

I earlier attached a comparrison of where the GBParser and the TIGRXMLParser put our data.  Let me know if I should send it again.  I assume you are all using an EMBL based parser.

-ed

> 
> From: Paul Mooney <pj...@sa...>
> Date: 2004/11/05 Fri PM 03:46:43 EST
> To: Steve Fischer <sfi...@pc...>
> CC: Ed Robinson <ed_...@be...>, 
>        gusdev-gusdev <gus...@li...>, 
>        "Aaron J. Mackey" <am...@pc...>
> Subject: Re: [Gusdev-gusdev] GUS & bioperl
> 
> 
> On 5 Nov 2004, at 17:40, Steve Fischer wrote:
> 
> >  about gbrowse.   it is good that Haiming has put together a prototype 
> > for loading gus data into gbrowse.
> >
> >  but, as Aaron points out, we will likely be putting sophisticated 
> > data into gbrowse.   i would rather start on a strong foundation than 
> > invest resources into a solution that we will grow out of.  
> >
> >  it should not be hard to put gus data into bioperl.
> >
> 
> We would welcome GUS to Bioperl software :)
> 
> Ed has a point: mapping GUS objects to bioperl objects and back again 
> needs some thought. 
> 
> I hope GFF output has improved in the latest CVS version of Bioperl, 
> the stable 1.4 version was not up to scratch for me so I just wrote my 
> own :(
> 
> 
> >  steve
> >
> >  Ed Robinson wrote:
> >
> > The first step in writing such an adapter needs to be a document, 
> > though, which shows what fields, in what formats go where in GUS.  One 
> > of the main problems with the parsers is that they have been developed 
> > without a common document saying what kind of information goes where.
> >
> > To this end, I have a simple analysis of where our TCruzi and Crypto 
> > data are being loaded by the different parsers.  I am attaching copies 
> > of these two brief documents in MS Word format.Presently this analysis 
> > is in Open Office format.  I would like to use these to start 
> > developing a data-destination document that we can use as a standard 
> > for all further parser development.
> >
> > Also, I am not sure that this solution is really necessary for GFF 
> > Format.  Writing a GFF adapter involves two steps 1. Querying the data 
> > and 2. Passing it correctly to BioPerl.  The solution we have so far 
> > is simply to put the formating information in the SQL query (it's one 
> > step).  Of course this is a solution that is ignorant of the GUS 
> > object model. It would be nice to embedd this process in an object 
> > which maps from GUS objects to BioPERL for a number of reasons.  But I 
> > also think it might be something to put off until later since the 
> > formatted SQL query is a quick-and-dirty time saver.
> >
> > -Ed
> >
> >
> >
> >
> >
> > From: Steve Fischer <sfi...@pc...>
> > Date: 2004/11/05 Fri AM 11:25:32 EST
> > To: gusdev-gusdev <gus...@li...>,
> >         "Aaron J. Mackey" <am...@pc...>
> > Subject: [Gusdev-gusdev] GUS & bioperl
> >
> > folks-
> >
> > We should immediately explore a GUS <--> bioperl adaptor.
> >
> > we would use it for:
> >    - Genbank and TIGR XML -> GUS
> >    - GUS -> GBrowse
> >    - possibly GUS -> Chado
> >
> > Here is what Aaron has to say about parsing Genbank, etc:
> >
> > Bio::SeqIO::GenBank is the BioPerl parser for GenBank; it parses and
> > represents all of it (split between Bio::Seq [sequence, id, accession,
> > etc], Bio::SeqFeature [everything found in the feature table] and
> > Bio::Annotation [comments, references, etc] objects).  Similar parsers
> > exist for practically all common sequence formats (including TIGR-XML
> > and other genome annotation-relevant formats).
> >
> > Here is what Aaron has to say about GBrowse:
> >
> > IMO, the "best" way to generate (valid) GFF is to use BioPerl's tools
> > for GFF manipulation: Bio::Tools::GFF in older BioPerl's, and
> > Bio::Feature::IO in the latest development release (due out any day
> > now, as soon as I stop reading my email; for now, you can get it from
> > CVS).
> >
> > To use these tools, you build Bio::SeqFeature objects that represent
> > the items you wish to dump as GFF; thus you can build complicated
> > hierarchies of gene models, exons, CDS, UTR, etc, adding deeply
> > structured attributes/annotations to each, and let the BioPerl GFF code
> > figure out how to represent it (in GFF2 or GFF3) so that other tools
> > (including Gbrowse) can read it.
> >
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by:
> > Sybase ASE Linux Express Edition - download now for FREE
> > LinuxWorld Reader's Choice Award Winner for best database on Linux.
> > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
> > _______________________________________________
> > Gusdev-gusdev mailing list
> > Gus...@li...
> > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
> >
> >
> > Ed Robinson
> > 255 Deerfield Rd
> > Bogart, GA 30622
> > (706)425-9181
> > Sweet Caroline
> >
> > good times never seemed so good.
> > I've been inclined
> > to believe they never would.
> >      --Neil Diamond
> >
> >
> > We're just a bunch of idiots.
> >       --Johnny Damon
> 
> 

Ed Robinson
255 Deerfield Rd
Bogart, GA 30622
(706)425-9181
Sweet Caroline

good times never seemed so good.
I've been inclined
to believe they never would.
     --Neil Diamond

We're just a bunch of idiots.
      --Johnny Damon