Re: [Treebase-devel] application-to-TreeBASE direct submission?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Based on what Bill and  Hilmar are saying, there is some enthusiasm  
for this.  So let me make some comments and ask a few questions with  
the aim of stimulating discussion on what is the best way to proceed.

1.  We now have a clear (albeit provisional) target for a metadata  
standard, which is the reconciled MIAPA draft checklist from the  
recent TDWG workshop:

   http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/MIAPADraft#Reconciled_draft_checklist

This specifies what kinds of things needs to be said to satisfy  
"minimum" reporting, e.g., "alignment method".  But it does not  
provide a controlled vocabulary, a grammar, or a syntax for that.

2.  The submission interface could be based on NeXML, as Bill  
suggests, i.e., all of the metadata could be packed into NeXML  
elements and streamed to TB.  This has some advantages in terms of  
promoting standardization and building on all of the grammar and  
syntax that NeXML has already.

The "low-hanging fruit" version that Bill describes would mean just  
putting a text blob into a NeXML "submission" or "miapa_checklist"   
element designed specifically for this purpose. Given external  
vocabulary support, NeXML can support something a bit better than  
this, which is to have a  "submission" or "miapa_checklist" bag filled  
with RDF-like triples (using NeXML's scheme for this). A further step  
might be to build some of the logical structure of the MIAPA checklist  
into the NeXML schema, though this raises the question of whether it  
all belongs in a "miapa-checklist" element or should be distributed in  
various places in the file (e.g., alignment method with characters,  
tree method with tree, author data at the top level, etc).

3.  If we want to build in support for measuring MIAPA conformance  
(i.e., this submission gets a 3.2 out of 7 checklist items), then  
there must be some kind of standardized grammar so that a machine can  
detect whether or not a record has specified a particular checklist  
element, e.g., alignment method.   A text blob will not suffice for  
this.

4.  None of this addresses where we are going to get controlled  
vocabularies to specify alignment methods, for instance.  Several  
people have tried to address this, and there are resources out there  
that have some elements of the desired vocabulary (mygrid services  
ontology; O'Meara's treetapper resource; CDAO).  Its easy to start  
this but hard to finish.  As Bill mentioned, it was a goal of CIPRES,  
too.  Every time someone tries to do this, they end up with a hornet's  
nest.  But maybe that is due to the lack of a clear target-- which  
perhaps is remedied by having a miapa checklist and an auto-submission  
problem to solve.

5.  Is it problematic that MEGA is not open-source, e.g., with respect  
to devoting resources to working with a non-open-source?  According to  
Sudhir (I asked him specifically about this) "the source code for the  
computational core is available upon request and permission is granted  
to use the computational core of MEGA for personal research and  
testing only", but that the GUI is based on proprietary components and  
the source code is not available.  Would this prevent us from working  
with MEGA programmers at a NESCent hackathon, for instance?  Would we  
ask Sudhir to open-source the submission component of the code as a  
separate module?

Arlin

On Nov 6, 2011, at 7:50 PM, Hilmar Lapp wrote:

> Hi Arlin,
>
> I spoke with Sudhir earlier this year at the ISMB conference about
> pretty much the same thing. The Dryad-TreeBASE interface isn't secret
> in any way [1,2], and as Bill points out is quite limited in what it
> achieves.
>
> In the ABI grant proposal we submitted in July [3], we actually
> propose to create precisely such a submission API that 3rd party
> applications can use to submit richly annotated data to TreeBASE
> directly, and indeed we propose to build on the Dryad/TreeBASE hand-
> shaking interface to accomplish this. If Sudhir has resources
> available to prototype this now, at the end of TreeBASE or MEGA or
> both, that'd be terrific, and I'd be happy to help as far as I can to
> facilitate that better.
>
> BTW I also spoke with Sudhir about possibly supporting NeXML from
> within MEGA, and he appeared very open to that - he said that
> essentially all he needs is someone who can help by providing the
> guidance on NeXML implementation. MEGA supporting NeXML wouldn't help
> with TreeBASE submission right now, but I imagine that the envisioned
> programmable submission API would certainly rely on NeXML.
>
> 	-hilmar
>
> [1] https://datadryad.org/wiki/TreeBASE_Submission_Integration
> [2] https://datadryad.org/wiki/BagIt_Handshaking
> [3] http://www.evoio.org/wiki/ABI_2011_proposal
>
> On Nov 4, 2011, at 7:53 AM, Arlin Stoltzfus wrote:
>
>> Hello all.  Yesterday I had a talk with Sudhir Kumar, author of MEGA,
>> which probably is responsible for more published trees than any other
>> phylogeny inference package (not necessarily the most trees among the
>> phylogeny elite represented in TreeBASE).   I discovered that MEGA  
>> has
>> a graphical name-reconciling interface for users to align mismatched
>> OTU names between tree and alignment files-- this is a common problem
>> and a barrier to re-use that I have encountered personally multiple
>> times.
>>
>> He suggested the idea that, to facilitate effective archiving, it
>> might be useful to have a way for phylogeny applications to  
>> generate a
>> submission in TreeBASE, providing metadata such as software version
>> and run conditions.
>>
>> Probably you have heard this suggestion before (I heard it earlier
>> this week from Joseph Hughes in regard to BEAST).
>>
>> I mentioned that TreeBASE has a top-secret interface that Dryad uses
>> to submit NEXUS files, and that this could be the basis for a
>> submission interface for other applications.  My understanding is  
>> that
>> this is done via web-services, and that the user gets a link to a
>> temporary submission that must be completed interactively.  I hope I
>> didn't give the wrong impression.
>>
>> Anyway, Sudhir was very interested in this.  He said that he has
>> programmers with time to work on this kind of thing.   If the MEGA
>> team prototyped a direct-submission interface, they could write a
>> brief paper about it, and maybe we could get other developers  
>> together
>> to hash out the metadata terms to support,  based on the recent MIAPA
>> exercise at TDWG.   If we could get MEGA and the top 3 TreeBASE
>> programs (PAUP, MB, RAXML-- right?), that would cover a very large
>> segment of users.
>>
>> I realize that this approach might not be the best way to promote
>> archiving in the long-term.  However, it might be more effective in
>> the short term, and we might learn a lot from it.
>>
>> I'd like to hear any thoughts you have on this.  Would this be a
>> useful exercise?  What are the disadvantages?  How could it fit  
>> into a
>> larger strategy?
>>
>> Arlin
>> -------
>> Arlin Stoltzfus (ar...@um...)
>> Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
>> IBBR, 9600 Gudelsky Drive, Rockville, MD
>> tel: 240 314 6208; web: www.molevol.org
>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Treebase-devel mailing list
>> Tre...@li...
>> https://lists.sourceforge.net/lists/listinfo/treebase-devel
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
> ===========================================================
>
>
>

-------
Arlin Stoltzfus (ar...@um...)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org