Re: [Treebase-devel] analysis metadata in TreeBASE, some experiments

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Interesting solution. MIAPA will definitely need to tackle analysis info, even if it's at a rudimentary level (like what produced what from what).  TreeBASE used to only have analysis records -- the analysis step records was introduced with the idea that submitters might want the ability to describe more complex analysis in a multi-step fashion (e.g. took matrix x and produced set of trees y, took set of trees y and produced consensus tree z, etc). But alas, very few submitters have taken advantage of this -- most just have one step per analysis. Yet having multiple steps adds a slightly greater mouse-clicking burden.  Maybe we should consider abandoning the multi-step design, and collapsing it down to single-step analysis entries? How might MIAPA come to an opinion on matters like this?

bp

On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote:

> Hi all,
> 
> over the weekend I did some experimentation with how additional
> metadata having to do with phylogenetic analyses stored by TreeBASE
> could be serialized. Attached is the result as produced by a test case
> that I committed to the TreeBASE source.
> 
> For context, here is how TreeBASE sees the world: every submission to
> TreeBASE consists of the results of one or more analyses. Each
> analysis consists of one or more analysis steps. For each step, we
> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
> PAUP). Optional additional metadata can consist of a textual
> description of the algorithm, a version number and URL of the software
> and a text string containing analysis step commands (perhaps something
> like a PAUP block).
> 
> Every analysis step has input and output data. These data can be trees
> and matrices. The set of taxa in the input must be a superset of the
> taxa in the output (i.e. some sort of taxon pruning is allowed, but
> new taxa cannot be introduced during an analysis step). All data
> that's accessible to third parties (i.e. all public, non-embargoed
> data) must be the input or output of at least one analysis step, i.e.
> we don't allow orphaned data in completed submissions.
> 
> In the attached example, I'm annotating the study (i.e. the root of
> the nexml document) to specify the permanent URLs of any associated
> analyses, and I annotate those analysis URLs with their respective
> analysis steps, specifying their PURLs and any additional metadata as
> described above. This is shown in lines 3-13.
> 
> Then, for every data object I specify for which analysis step(s) it is
> the input and/or output (a data object can be both input and output if
> analysis steps are chained together). This is shown in line 448 for a
> character state matrix and line 1849 for a tree.
> 
> This is all highly experimental but I figured I'd share at as a
> discussion piece for refining actual implementation of MIAPA
> annotations.
> 
> Rutger