From: Ed R. <ero...@ug...> - 2005-06-10 14:58:13
|
It is robust for embl but it is not fully tested for TIGR. The main tigr dataset we use is not consistent with the TIGR DTD, so the testing on that data is incomplete. If anyone uses the plugin, they will have to update the XML Map to add their features and they may need to make a few other modifications to handle the structure of their GB data. When I used the sequence loader to load S.mansoni data, we found that gene features had multiple db_xrefs, so I had to modify that sub-routine. IF ANYBODY IN THE GUS COMMUNITY WANTS TO USE THE PLUGIN FOR THEIR GB, EMBL OR TIGR DATA, I WILL GLADLY DO THE SUPPORT WORK ON THE PLUGIN. The more data we load with the plugin, the more robust it will become. This is the best way I can think of to work out the kinks. Ultimately, the plugin should support GB, EMBL, DBJ, Tigr, Chado and any other rich-seq format supported by BioPerl. -ed ---- Original message ---- >Date: Fri, 10 Jun 2005 10:19:19 -0400 >From: Steve Fischer <sfi...@pc...> >Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files >To: "Pablo N. Mendes" <pa...@pa...> >Cc: gus...@li... > >The UGA and Penn folks are working on a plugin that uses bioperl to >parse the input files sequence/feature files, and then load them into >GUS. It takes a simple XML mapping file that specifies how to go from >the bioperl objects to gus objects. > >it is nowhere near as sophisticated as the GUS XML made by Terry Clark. > >It will handle genbank, tigr xml and embl. > >so far it is working in production for genbank files (but, it only >inserts and will update soon) > >basically, at the start it will be a replacement for the GBParser plugin. > >for the 3.5 release it will be called InsertGenbankSequenceRecords (up >for debate). (Ed, how robust is it for embl and/or tigr xml?) > >steve > >Pablo N. Mendes wrote: > >> Hi folks, >> I find working with tab delimited files quite uncomfortable and >> sometimes dangerous. >> We don't have ways to check well formedness or schema compliance (like >> in XML with XSDs or DTDs). >> This could cause execution halts after long time running or worse: >> wrong data loaded into the database. >> >> I defend the idea of having such a generic plugin for loading XML into >> GUS, also based on >> a data description file. I've noticed that NCBI already offer XML as a >> possible format for download. >> Other data sources tend to do the same. >> >> Any thoughts on this? >> >> About the GUS XML effort, I find it very interesting. I'll check the >> material to get to know it better. >> >> Best, >> Pablo >> >> ----- Original Message ----- From: "Terry Clark" <tc...@it...> >> To: "Eric E. Snyder" <es...@vb...> >> Cc: <gus...@li...> >> Sent: Thursday, June 09, 2005 7:43 PM >> Subject: Re: [GUSDEV] Generic GUS data loader for tab delimited files >> >> >>> Dear Eric, >>> We have a such an effort underway using XML formatted input data. >>> Here's a pointer to the project >>> http://flora.ittc.ku.edu/xmlgus/ >>> This method requires >>> some_format -> GUS' XML -> GUS object layer >>> >>> The system, running as a plugin, reads input in a GUS XML format that >>> is formatted to correspond with relational tables and GUS objects. >>> The mapping is instantiated in the XMLGUS framework as a YACC grammar >>> chosen for structure and the declarative approach for the plugin. >>> We're adding automation to some of the intermediate steps presently. >>> I'd be happy to help you try this out if you are interested. >>> >>> all the best, >>> >>> Terry >>> >>> On 0, "Eric E. Snyder" <es...@vb...> wrote: >>> >>>> Dear GUSdev, >>>> >>>> We have been having some trouble loading DNA annotation data via the >>>> gbparser plugin. We have been able to get around the problem in this >>>> instance by using addrow, which is quite general but impossibly slow. I >>>> cannot help but think there must be a generic tool for loading >>>> tab-delimited data files into GUS. >>>> >>>> Assuming there isn't, I think it would be time well spent if someone >>>> wrote a plugin for GUS that would *efficiently* load data in >>>> tab-delimited format based on instructions described in a >>>> general-purpose data description file. This file would identify the >>>> tables and fields corresponding to each column in the input file. It >>>> would also need to define the rules for associating data from records >>>> stored in multiple tables and probably do other things as well. >>>> >>>> Any takers? I would be happy to spend whatever time is necessary to >>>> define the requirements for such a system. If it doesn't already exist >>>> somewhere in the GUS community, I certainly think it would be useful. >>>> >>>> I apologize in advance if this is a recent or frequent topic for this >>>> list. I just subscribed and wasn't able to access sourceforge to check >>>> the archives. >>>> >>>> Thanks! >>>> eesnyder >>>> -- >>>> Eric E. Snyder, Ph.D. >>>> Virginia Bioinformatics Institute >>>> Washington Street Phase 1 (0447) >>>> Virginia Polytechnic Institute and State University >>>> Blacksburg, VA 24061 >>>> USA >>>> >>>> Office: (540) 231-5428 >>>> Mobile: (540) 230-5225 >>>> Fax: (540) 231-2891 >>>> Email: ees...@vb... >>>> JDAM: N 37 12'01.6", W 80 24'26.9" >>> >>> >>> >>> >>> >>> ------------------------------------------------------- >>> This SF.Net email is sponsored by: NEC IT Guy Games. How far can you >>> shotput >>> a projector? How fast can you ride your desk chair down the office >>> luge track? >>> If you want to score the big prize, get to know the little guy. >>> Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: NEC IT Guy Games. How far can you >> shotput >> a projector? How fast can you ride your desk chair down the office >> luge track? >> If you want to score the big prize, get to know the little guy. Play >> to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > >------------------------------------------------------- >This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput >a projector? How fast can you ride your desk chair down the office luge track? >If you want to score the big prize, get to know the little guy. >Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >_______________________________________________ >Gusdev-gusdev mailing list >Gus...@li... >https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev ----------------- Ed Robinson Center for Tropical and Emerging Global Diseases University of Georgia, Athens, GA 30602 ero...@ug.../(706)542.1447/254.8883 |