Re: [Treebase-devel] analysis metadata in TreeBASE, some experiments

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

There isn't really a community process that decides what is or is not
MIAPA, so at this stage these sorts of experiments are to inform
people who are working on fleshing out the details of terms and
concepts expressed in MIAPA. The person who's most involved in
implementation details is Maryam - who I assumed was in the relevant
mailing list but who's cc'ed here as well.

On Tue, Jun 14, 2011 at 4:03 PM, William Piel <wil...@ya...> wrote:
>
> Interesting solution. MIAPA will definitely need to tackle analysis info, even if it's at a rudimentary level (like what produced what from what).  TreeBASE used to only have analysis records -- the analysis step records was introduced with the idea that submitters might want the ability to describe more complex analysis in a multi-step fashion (e.g. took matrix x and produced set of trees y, took set of trees y and produced consensus tree z, etc). But alas, very few submitters have taken advantage of this -- most just have one step per analysis. Yet having multiple steps adds a slightly greater mouse-clicking burden.  Maybe we should consider abandoning the multi-step design, and collapsing it down to single-step analysis entries? How might MIAPA come to an opinion on matters like this?
>
> bp
>
> On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote:
>
>> Hi all,
>>
>> over the weekend I did some experimentation with how additional
>> metadata having to do with phylogenetic analyses stored by TreeBASE
>> could be serialized. Attached is the result as produced by a test case
>> that I committed to the TreeBASE source.
>>
>> For context, here is how TreeBASE sees the world: every submission to
>> TreeBASE consists of the results of one or more analyses. Each
>> analysis consists of one or more analysis steps. For each step, we
>> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
>> PAUP). Optional additional metadata can consist of a textual
>> description of the algorithm, a version number and URL of the software
>> and a text string containing analysis step commands (perhaps something
>> like a PAUP block).
>>
>> Every analysis step has input and output data. These data can be trees
>> and matrices. The set of taxa in the input must be a superset of the
>> taxa in the output (i.e. some sort of taxon pruning is allowed, but
>> new taxa cannot be introduced during an analysis step). All data
>> that's accessible to third parties (i.e. all public, non-embargoed
>> data) must be the input or output of at least one analysis step, i.e.
>> we don't allow orphaned data in completed submissions.
>>
>> In the attached example, I'm annotating the study (i.e. the root of
>> the nexml document) to specify the permanent URLs of any associated
>> analyses, and I annotate those analysis URLs with their respective
>> analysis steps, specifying their PURLs and any additional metadata as
>> described above. This is shown in lines 3-13.
>>
>> Then, for every data object I specify for which analysis step(s) it is
>> the input and/or output (a data object can be both input and output if
>> analysis steps are chained together). This is shown in line 448 for a
>> character state matrix and line 1849 for a tree.
>>
>> This is all highly experimental but I figured I'd share at as a
>> discussion piece for refining actual implementation of MIAPA
>> annotations.
>>
>> Rutger
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "MIAPA" group.
> For more options, visit this group at
> http://groups.google.com/group/miapa-discuss?hl=en
>

-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com