From: Arlin S. <ar...@um...> - 2011-03-09 15:30:37
|
Hi. Not sure if this is the right Mesquite list, because I'm trying to reach developers, rather than users, with some ideas for summer projects. Some interest has been expressed in 2 Mesquite-related GSOC (Google summer-of-code) proposals, for inclusion in the NESCent- organized "phylosoc" package of proposals. Google provides summer support for work on open-source projects. What these potential projects would need from you folks is a a) programmer committed to serving as a mentor, and b) a compelling write- up. If you might be interested, read on... One proposal is for a graphical UI to design workflow descriptions http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2011#Graphical_UI_for_designing_phylo_workflow_descriptions Annotating workflows is a stumbling block in creating re-useable phylogenetic records. The idea here is that users would use drag-an- drop tools to compose a phylogenetics workflow (with pipes or flow- chart icons or whatever), and this would be converted into a set of annotations useful for an archival record. An example of this sort of thing, executed with Google Web Toolkit, would be http://exon.niaid.nih.gov/mobyleWorkflow . Ultimately (when hooked up to the right back-end) this could be used to create executable workflow descriptions. Some folks have suggested that Mesquite would provide an ideal framework for developing this. Vivek Gopalan (who developed the example above) is one mentor, but we would need an experienced Mesquite programmer to join with us. The second idea, which is not on the proposals page yet, is an intelligent submission tool for TreeBASE. Mesquite is ideal for this because TreeBASE already recommends that users format their NEXUS files using Mesquite, for compatibility. Ideally this tool would solve 2 main problems, using some intelligence to aid the user. Bill Piel reports that a major stumbling block in submission is that users start the process with separate 1) tree and 2) alignment (or other char matrix) with non-matching OTU names. This corresponds to my own experience trying to re-use other people's data. Finding an optimal name-match (the submission tool could propose a match to the user for manual verification) turns out to be a simple and well-studied problem in CS called "the marriage problem". The second major stumbling block is that, in order to annotate provenance, users need to match up (tediously) GenBank accession numbers and species identifiers. In the case of sequence alignments, an intelligent tool could leverage NCBI services to guess the accession and species (i.e., BLAST it). Given accessions, an intelligent tool could supply NCBI species ids (an even easier problem). Initially, this tool could create a NEXUS file with a TreeBASE block containing the annotations (in the future, presumably the preferred format will be NeXML). What makes the second proposal sexy is the use of intelligence to aid the user. Again, Mesquite might provide an ideal development platform. For either proposal, let me know ASAP if you are interested. The GSOC project proposals need to be finished up this week. Thanks for your time, Arlin ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |