From: Arlin S. <ar...@um...> - 2011-11-07 14:08:36
|
Based on what Bill and Hilmar are saying, there is some enthusiasm for this. So let me make some comments and ask a few questions with the aim of stimulating discussion on what is the best way to proceed. 1. We now have a clear (albeit provisional) target for a metadata standard, which is the reconciled MIAPA draft checklist from the recent TDWG workshop: http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/MIAPADraft#Reconciled_draft_checklist This specifies what kinds of things needs to be said to satisfy "minimum" reporting, e.g., "alignment method". But it does not provide a controlled vocabulary, a grammar, or a syntax for that. 2. The submission interface could be based on NeXML, as Bill suggests, i.e., all of the metadata could be packed into NeXML elements and streamed to TB. This has some advantages in terms of promoting standardization and building on all of the grammar and syntax that NeXML has already. The "low-hanging fruit" version that Bill describes would mean just putting a text blob into a NeXML "submission" or "miapa_checklist" element designed specifically for this purpose. Given external vocabulary support, NeXML can support something a bit better than this, which is to have a "submission" or "miapa_checklist" bag filled with RDF-like triples (using NeXML's scheme for this). A further step might be to build some of the logical structure of the MIAPA checklist into the NeXML schema, though this raises the question of whether it all belongs in a "miapa-checklist" element or should be distributed in various places in the file (e.g., alignment method with characters, tree method with tree, author data at the top level, etc). 3. If we want to build in support for measuring MIAPA conformance (i.e., this submission gets a 3.2 out of 7 checklist items), then there must be some kind of standardized grammar so that a machine can detect whether or not a record has specified a particular checklist element, e.g., alignment method. A text blob will not suffice for this. 4. None of this addresses where we are going to get controlled vocabularies to specify alignment methods, for instance. Several people have tried to address this, and there are resources out there that have some elements of the desired vocabulary (mygrid services ontology; O'Meara's treetapper resource; CDAO). Its easy to start this but hard to finish. As Bill mentioned, it was a goal of CIPRES, too. Every time someone tries to do this, they end up with a hornet's nest. But maybe that is due to the lack of a clear target-- which perhaps is remedied by having a miapa checklist and an auto-submission problem to solve. 5. Is it problematic that MEGA is not open-source, e.g., with respect to devoting resources to working with a non-open-source? According to Sudhir (I asked him specifically about this) "the source code for the computational core is available upon request and permission is granted to use the computational core of MEGA for personal research and testing only", but that the GUI is based on proprietary components and the source code is not available. Would this prevent us from working with MEGA programmers at a NESCent hackathon, for instance? Would we ask Sudhir to open-source the submission component of the code as a separate module? Arlin On Nov 6, 2011, at 7:50 PM, Hilmar Lapp wrote: > Hi Arlin, > > I spoke with Sudhir earlier this year at the ISMB conference about > pretty much the same thing. The Dryad-TreeBASE interface isn't secret > in any way [1,2], and as Bill points out is quite limited in what it > achieves. > > In the ABI grant proposal we submitted in July [3], we actually > propose to create precisely such a submission API that 3rd party > applications can use to submit richly annotated data to TreeBASE > directly, and indeed we propose to build on the Dryad/TreeBASE hand- > shaking interface to accomplish this. If Sudhir has resources > available to prototype this now, at the end of TreeBASE or MEGA or > both, that'd be terrific, and I'd be happy to help as far as I can to > facilitate that better. > > BTW I also spoke with Sudhir about possibly supporting NeXML from > within MEGA, and he appeared very open to that - he said that > essentially all he needs is someone who can help by providing the > guidance on NeXML implementation. MEGA supporting NeXML wouldn't help > with TreeBASE submission right now, but I imagine that the envisioned > programmable submission API would certainly rely on NeXML. > > -hilmar > > [1] https://datadryad.org/wiki/TreeBASE_Submission_Integration > [2] https://datadryad.org/wiki/BagIt_Handshaking > [3] http://www.evoio.org/wiki/ABI_2011_proposal > > On Nov 4, 2011, at 7:53 AM, Arlin Stoltzfus wrote: > >> Hello all. Yesterday I had a talk with Sudhir Kumar, author of MEGA, >> which probably is responsible for more published trees than any other >> phylogeny inference package (not necessarily the most trees among the >> phylogeny elite represented in TreeBASE). I discovered that MEGA >> has >> a graphical name-reconciling interface for users to align mismatched >> OTU names between tree and alignment files-- this is a common problem >> and a barrier to re-use that I have encountered personally multiple >> times. >> >> He suggested the idea that, to facilitate effective archiving, it >> might be useful to have a way for phylogeny applications to >> generate a >> submission in TreeBASE, providing metadata such as software version >> and run conditions. >> >> Probably you have heard this suggestion before (I heard it earlier >> this week from Joseph Hughes in regard to BEAST). >> >> I mentioned that TreeBASE has a top-secret interface that Dryad uses >> to submit NEXUS files, and that this could be the basis for a >> submission interface for other applications. My understanding is >> that >> this is done via web-services, and that the user gets a link to a >> temporary submission that must be completed interactively. I hope I >> didn't give the wrong impression. >> >> Anyway, Sudhir was very interested in this. He said that he has >> programmers with time to work on this kind of thing. If the MEGA >> team prototyped a direct-submission interface, they could write a >> brief paper about it, and maybe we could get other developers >> together >> to hash out the metadata terms to support, based on the recent MIAPA >> exercise at TDWG. If we could get MEGA and the top 3 TreeBASE >> programs (PAUP, MB, RAXML-- right?), that would cover a very large >> segment of users. >> >> I realize that this approach might not be the best way to promote >> archiving in the long-term. However, it might be more effective in >> the short term, and we might learn a lot from it. >> >> I'd like to hear any thoughts you have on this. Would this be a >> useful exercise? What are the disadvantages? How could it fit >> into a >> larger strategy? >> >> Arlin >> ------- >> Arlin Stoltzfus (ar...@um...) >> Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST >> IBBR, 9600 Gudelsky Drive, Rockville, MD >> tel: 240 314 6208; web: www.molevol.org >> >> >> ------------------------------------------------------------------------------ >> RSA(R) Conference 2012 >> Save $700 by Nov 18 >> Register now >> http://p.sf.net/sfu/rsa-sfdev2dev1 >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : > =========================================================== > > > ------- Arlin Stoltzfus (ar...@um...) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD tel: 240 314 6208; web: www.molevol.org |