Thread: [Treebase-devel] analysis metadata in TreeBASE, some experiments

Status: Beta

Brought to you by: hlapp, naturalis, rvos, sfrgpiel

treebase-devel

[Treebase-devel] analysis metadata in TreeBASE, some experiments

From: Rutger V. <R....@re...> - 2011-06-13 12:45:18

Attachments: outfile.xml

Hi all,

over the weekend I did some experimentation with how additional
metadata having to do with phylogenetic analyses stored by TreeBASE
could be serialized. Attached is the result as produced by a test case
that I committed to the TreeBASE source.

For context, here is how TreeBASE sees the world: every submission to
TreeBASE consists of the results of one or more analyses. Each
analysis consists of one or more analysis steps. For each step, we
store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
PAUP). Optional additional metadata can consist of a textual
description of the algorithm, a version number and URL of the software
and a text string containing analysis step commands (perhaps something
like a PAUP block).

Every analysis step has input and output data. These data can be trees
and matrices. The set of taxa in the input must be a superset of the
taxa in the output (i.e. some sort of taxon pruning is allowed, but
new taxa cannot be introduced during an analysis step). All data
that's accessible to third parties (i.e. all public, non-embargoed
data) must be the input or output of at least one analysis step, i.e.
we don't allow orphaned data in completed submissions.

In the attached example, I'm annotating the study (i.e. the root of
the nexml document) to specify the permanent URLs of any associated
analyses, and I annotate those analysis URLs with their respective
analysis steps, specifying their PURLs and any additional metadata as
described above. This is shown in lines 3-13.

Then, for every data object I specify for which analysis step(s) it is
the input and/or output (a data object can be both input and output if
analysis steps are chained together). This is shown in line 448 for a
character state matrix and line 1849 for a tree.

This is all highly experimental but I figured I'd share at as a
discussion piece for refining actual implementation of MIAPA
annotations.

Rutger

-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com

Re: [Treebase-devel] analysis metadata in TreeBASE, some experiments

From: Rutger V. <R....@re...> - 2011-06-13 12:58:39

It looks like the example file was blocked by the mailman software so
I committed it to the nexml examples directory. Here it is:
http://nexml.svn.sourceforge.net/viewvc/nexml/trunk/nexml/examples/treebase-record.xml?revision=1697&view=markup

On Mon, Jun 13, 2011 at 1:37 PM, Rutger Vos <R....@re...> wrote:
> Hi all,
>
> over the weekend I did some experimentation with how additional
> metadata having to do with phylogenetic analyses stored by TreeBASE
> could be serialized. Attached is the result as produced by a test case
> that I committed to the TreeBASE source.
>
> For context, here is how TreeBASE sees the world: every submission to
> TreeBASE consists of the results of one or more analyses. Each
> analysis consists of one or more analysis steps. For each step, we
> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
> PAUP). Optional additional metadata can consist of a textual
> description of the algorithm, a version number and URL of the software
> and a text string containing analysis step commands (perhaps something
> like a PAUP block).
>
> Every analysis step has input and output data. These data can be trees
> and matrices. The set of taxa in the input must be a superset of the
> taxa in the output (i.e. some sort of taxon pruning is allowed, but
> new taxa cannot be introduced during an analysis step). All data
> that's accessible to third parties (i.e. all public, non-embargoed
> data) must be the input or output of at least one analysis step, i.e.
> we don't allow orphaned data in completed submissions.
>
> In the attached example, I'm annotating the study (i.e. the root of
> the nexml document) to specify the permanent URLs of any associated
> analyses, and I annotate those analysis URLs with their respective
> analysis steps, specifying their PURLs and any additional metadata as
> described above. This is shown in lines 3-13.
>
> Then, for every data object I specify for which analysis step(s) it is
> the input and/or output (a data object can be both input and output if
> analysis steps are chained together). This is shown in line 448 for a
> character state matrix and line 1849 for a tree.
>
> This is all highly experimental but I figured I'd share at as a
> discussion piece for refining actual implementation of MIAPA
> annotations.
>
> Rutger
>
> --
> Dr. Rutger A. Vos
> School of Biological Sciences
> Philip Lyle Building, Level 4
> University of Reading
> Reading, RG6 6BX, United Kingdom
> Tel: +44 (0) 118 378 7535
> http://rutgervos.blogspot.com
>



-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com

Re: [Treebase-devel] analysis metadata in TreeBASE, some experiments

From: William P. <wil...@ya...> - 2011-06-14 15:03:25

Interesting solution. MIAPA will definitely need to tackle analysis info, even if it's at a rudimentary level (like what produced what from what).  TreeBASE used to only have analysis records -- the analysis step records was introduced with the idea that submitters might want the ability to describe more complex analysis in a multi-step fashion (e.g. took matrix x and produced set of trees y, took set of trees y and produced consensus tree z, etc). But alas, very few submitters have taken advantage of this -- most just have one step per analysis. Yet having multiple steps adds a slightly greater mouse-clicking burden.  Maybe we should consider abandoning the multi-step design, and collapsing it down to single-step analysis entries? How might MIAPA come to an opinion on matters like this?

bp

On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote:

> Hi all,
> 
> over the weekend I did some experimentation with how additional
> metadata having to do with phylogenetic analyses stored by TreeBASE
> could be serialized. Attached is the result as produced by a test case
> that I committed to the TreeBASE source.
> 
> For context, here is how TreeBASE sees the world: every submission to
> TreeBASE consists of the results of one or more analyses. Each
> analysis consists of one or more analysis steps. For each step, we
> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
> PAUP). Optional additional metadata can consist of a textual
> description of the algorithm, a version number and URL of the software
> and a text string containing analysis step commands (perhaps something
> like a PAUP block).
> 
> Every analysis step has input and output data. These data can be trees
> and matrices. The set of taxa in the input must be a superset of the
> taxa in the output (i.e. some sort of taxon pruning is allowed, but
> new taxa cannot be introduced during an analysis step). All data
> that's accessible to third parties (i.e. all public, non-embargoed
> data) must be the input or output of at least one analysis step, i.e.
> we don't allow orphaned data in completed submissions.
> 
> In the attached example, I'm annotating the study (i.e. the root of
> the nexml document) to specify the permanent URLs of any associated
> analyses, and I annotate those analysis URLs with their respective
> analysis steps, specifying their PURLs and any additional metadata as
> described above. This is shown in lines 3-13.
> 
> Then, for every data object I specify for which analysis step(s) it is
> the input and/or output (a data object can be both input and output if
> analysis steps are chained together). This is shown in line 448 for a
> character state matrix and line 1849 for a tree.
> 
> This is all highly experimental but I figured I'd share at as a
> discussion piece for refining actual implementation of MIAPA
> annotations.
> 
> Rutger

Re: [Treebase-devel] analysis metadata in TreeBASE, some experiments

From: Rutger V. <R....@re...> - 2011-06-14 15:10:47

There isn't really a community process that decides what is or is not
MIAPA, so at this stage these sorts of experiments are to inform
people who are working on fleshing out the details of terms and
concepts expressed in MIAPA. The person who's most involved in
implementation details is Maryam - who I assumed was in the relevant
mailing list but who's cc'ed here as well.

On Tue, Jun 14, 2011 at 4:03 PM, William Piel <wil...@ya...> wrote:
>
> Interesting solution. MIAPA will definitely need to tackle analysis info, even if it's at a rudimentary level (like what produced what from what).  TreeBASE used to only have analysis records -- the analysis step records was introduced with the idea that submitters might want the ability to describe more complex analysis in a multi-step fashion (e.g. took matrix x and produced set of trees y, took set of trees y and produced consensus tree z, etc). But alas, very few submitters have taken advantage of this -- most just have one step per analysis. Yet having multiple steps adds a slightly greater mouse-clicking burden.  Maybe we should consider abandoning the multi-step design, and collapsing it down to single-step analysis entries? How might MIAPA come to an opinion on matters like this?
>
> bp
>
> On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote:
>
>> Hi all,
>>
>> over the weekend I did some experimentation with how additional
>> metadata having to do with phylogenetic analyses stored by TreeBASE
>> could be serialized. Attached is the result as produced by a test case
>> that I committed to the TreeBASE source.
>>
>> For context, here is how TreeBASE sees the world: every submission to
>> TreeBASE consists of the results of one or more analyses. Each
>> analysis consists of one or more analysis steps. For each step, we
>> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
>> PAUP). Optional additional metadata can consist of a textual
>> description of the algorithm, a version number and URL of the software
>> and a text string containing analysis step commands (perhaps something
>> like a PAUP block).
>>
>> Every analysis step has input and output data. These data can be trees
>> and matrices. The set of taxa in the input must be a superset of the
>> taxa in the output (i.e. some sort of taxon pruning is allowed, but
>> new taxa cannot be introduced during an analysis step). All data
>> that's accessible to third parties (i.e. all public, non-embargoed
>> data) must be the input or output of at least one analysis step, i.e.
>> we don't allow orphaned data in completed submissions.
>>
>> In the attached example, I'm annotating the study (i.e. the root of
>> the nexml document) to specify the permanent URLs of any associated
>> analyses, and I annotate those analysis URLs with their respective
>> analysis steps, specifying their PURLs and any additional metadata as
>> described above. This is shown in lines 3-13.
>>
>> Then, for every data object I specify for which analysis step(s) it is
>> the input and/or output (a data object can be both input and output if
>> analysis steps are chained together). This is shown in line 448 for a
>> character state matrix and line 1849 for a tree.
>>
>> This is all highly experimental but I figured I'd share at as a
>> discussion piece for refining actual implementation of MIAPA
>> annotations.
>>
>> Rutger
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "MIAPA" group.
> For more options, visit this group at
> http://groups.google.com/group/miapa-discuss?hl=en
>



-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com

Re: [Treebase-devel] analysis metadata in TreeBASE, some experiments

From: Enrico P. <epo...@cs...> - 2011-06-14 16:52:10

I don¹t believe the two things are exclusive - a single step is just a
"big step" that can be refined into a sequence of smaller steps if one
wants to. 
I believe Maryam is currently looking at the "leaves" of this hierarchical
tree that decomposes the analysis (by looking at the individual tools that
can be used in the analysis).

Enrico

-- 
Dept. Computer Science,
New Mexico State University
MSC CS, Box 30001, Las Cruces, NM 88003
Voice: 575-646-6239   Fax: 575-646-1002





On 6/14/11 9:03 AM, "William Piel" <wil...@ya...> wrote:

>
>Interesting solution. MIAPA will definitely need to tackle analysis info,
>even if it's at a rudimentary level (like what produced what from what).
>TreeBASE used to only have analysis records -- the analysis step records
>was introduced with the idea that submitters might want the ability to
>describe more complex analysis in a multi-step fashion (e.g. took matrix
>x and produced set of trees y, took set of trees y and produced consensus
>tree z, etc). But alas, very few submitters have taken advantage of this
>-- most just have one step per analysis. Yet having multiple steps adds a
>slightly greater mouse-clicking burden.  Maybe we should consider
>abandoning the multi-step design, and collapsing it down to single-step
>analysis entries? How might MIAPA come to an opinion on matters like this?
>
>bp
>
>On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote:
>
>> Hi all,
>> 
>> over the weekend I did some experimentation with how additional
>> metadata having to do with phylogenetic analyses stored by TreeBASE
>> could be serialized. Attached is the result as produced by a test case
>> that I committed to the TreeBASE source.
>> 
>> For context, here is how TreeBASE sees the world: every submission to
>> TreeBASE consists of the results of one or more analyses. Each
>> analysis consists of one or more analysis steps. For each step, we
>> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g.
>> PAUP). Optional additional metadata can consist of a textual
>> description of the algorithm, a version number and URL of the software
>> and a text string containing analysis step commands (perhaps something
>> like a PAUP block).
>> 
>> Every analysis step has input and output data. These data can be trees
>> and matrices. The set of taxa in the input must be a superset of the
>> taxa in the output (i.e. some sort of taxon pruning is allowed, but
>> new taxa cannot be introduced during an analysis step). All data
>> that's accessible to third parties (i.e. all public, non-embargoed
>> data) must be the input or output of at least one analysis step, i.e.
>> we don't allow orphaned data in completed submissions.
>> 
>> In the attached example, I'm annotating the study (i.e. the root of
>> the nexml document) to specify the permanent URLs of any associated
>> analyses, and I annotate those analysis URLs with their respective
>> analysis steps, specifying their PURLs and any additional metadata as
>> described above. This is shown in lines 3-13.
>> 
>> Then, for every data object I specify for which analysis step(s) it is
>> the input and/or output (a data object can be both input and output if
>> analysis steps are chained together). This is shown in line 448 for a
>> character state matrix and line 1849 for a tree.
>> 
>> This is all highly experimental but I figured I'd share at as a
>> discussion piece for refining actual implementation of MIAPA
>> annotations.
>> 
>> Rutger
>
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "MIAPA" group.
>For more options, visit this group at
>http://groups.google.com/group/miapa-discuss?hl=en