From: Aaron J. M. <am...@pc...> - 2004-11-09 19:42:35
|
A quick google search reveals that efforts to make GUS more BioPerl-aware have (seemingly) been attempted since (at least) 2002. I wonder then why everyone on the mailing lists sounds so startled to hear about BioPerl's capabilities, and where the previous efforts have lead ... Steve, you mention that you're afraid BioPerl won't be capable of capturing all the information contained in a GUS deployment; though I have only a passing acquaintance with GUS, I believe the only aspect of GUS that BioPerl would not be able to represent (in some meaningful form) is the RAD schema. BioPerl has both strongly-typed and weakly-typed feature/annotation systems, and includes facilities for such data related to (in no particular order) Sequences, Assemblies, Alignments, Trees, Graphs, Ontologies, Pedigrees, Analyses, Coordinate Systems, Databases, Events, Matrices, Phenotypes, Species, Symbols, SNPs, etc. For instance, I would argue that using even just a weakly-typed sequence feature system, one can successfully represent all sequence-related data (although the prior post about semantic difficulties remains relevant; BioPerl may not share the same semantics [or be missing it entirely] as GUS, and this will have to be programmed). Also, I don't see GenBank locations as a big issue ("fuzzy locations"), since you're only talking about exposing GUS data to BioPerl (which doesn't have fuzzy locations) or importing GUS-capable data (which presumably will also not have fuzzy locations). That said, BioPerl has an entire suite of location/coordinate handling modules (including support for split and/or fuzzy locations, representing the entire GenBank location moddel), and mechanisms/policies for "focussing" fuzzy locations into exact locations. You're not the first group to ever face such problems. Now, you will certainly find as you get more into BioPerl that there are particularly dusty corners (or ill-documented, but in active use); I would argue that these are areas where your developers might spend some other their valuable time that was saved by using BioPerl in the first place to make improvements to BioPerl, rather than backing away and saying "Oh look, the last 5% of the functionality we need isn't there, I guess we'll have to do it all ourselves" (cf: the Ensembl project). BioPerl is becoming the lingua-franca of wet-bench biologists who are picking up Perl programming for their own lightweight bioinformatics tasks. In 2003, various members of the BioPerl team gave many short lectures and workshops are their various host institutions, and at national meetings; in 2004, week-long BioPerl workshops were held for biologist/programmers at the University of Montreal, CSHL and the NIH. Requests for BioPerl training in 2005 are numerous and growing. Leaders of the NCI's caBIO and EBI's EnsEMBL projects recognize this, and are currently designing their own "bridges" between their code bases and BioPerl. As such, there is much chatter on the BioPerl lissts about ways to improve the interoperability of BioPerl objects while maintaining current "ease of use" (arguable, of course, but there are even plans to make that better as well). It would be great to have more GUS developers as part of the BioPerl community (and, frankly, it boggles my mind that there are professional bioinformatics Perl programmers who have little or no exposure to, or awareness of, BioPerl; remind me to buy more GoogleAds for bioperl.org). Finally, I won't speak for Lincoln Stein, but I know that the GMOD project does want to include GUS in its effort to assimilate existing tools, but the current GMOD developers find GUS unapproachable. I would guess that Lincoln and friends would be willing to help out an intrepid GUS aficionado get GUS at least talking to the GMOD toolkit (presumably leveraging BioPerl to talk ChadoXML; which isn't probably as straightforward as it sounds; cf. semantic issues raised earlier). I'll keep listening ... Best wishes, -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: am...@pc... 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 |