From: Rutger V. <R....@re...> - 2011-06-13 12:45:18
Attachments:
outfile.xml
|
Hi all, over the weekend I did some experimentation with how additional metadata having to do with phylogenetic analyses stored by TreeBASE could be serialized. Attached is the result as produced by a test case that I committed to the TreeBASE source. For context, here is how TreeBASE sees the world: every submission to TreeBASE consists of the results of one or more analyses. Each analysis consists of one or more analysis steps. For each step, we store the "algorithm" (e.g. neighbor joining) and the "software" (e.g. PAUP). Optional additional metadata can consist of a textual description of the algorithm, a version number and URL of the software and a text string containing analysis step commands (perhaps something like a PAUP block). Every analysis step has input and output data. These data can be trees and matrices. The set of taxa in the input must be a superset of the taxa in the output (i.e. some sort of taxon pruning is allowed, but new taxa cannot be introduced during an analysis step). All data that's accessible to third parties (i.e. all public, non-embargoed data) must be the input or output of at least one analysis step, i.e. we don't allow orphaned data in completed submissions. In the attached example, I'm annotating the study (i.e. the root of the nexml document) to specify the permanent URLs of any associated analyses, and I annotate those analysis URLs with their respective analysis steps, specifying their PURLs and any additional metadata as described above. This is shown in lines 3-13. Then, for every data object I specify for which analysis step(s) it is the input and/or output (a data object can be both input and output if analysis steps are chained together). This is shown in line 448 for a character state matrix and line 1849 for a tree. This is all highly experimental but I figured I'd share at as a discussion piece for refining actual implementation of MIAPA annotations. Rutger -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: Rutger V. <R....@re...> - 2011-06-13 12:58:39
|
It looks like the example file was blocked by the mailman software so I committed it to the nexml examples directory. Here it is: http://nexml.svn.sourceforge.net/viewvc/nexml/trunk/nexml/examples/treebase-record.xml?revision=1697&view=markup On Mon, Jun 13, 2011 at 1:37 PM, Rutger Vos <R....@re...> wrote: > Hi all, > > over the weekend I did some experimentation with how additional > metadata having to do with phylogenetic analyses stored by TreeBASE > could be serialized. Attached is the result as produced by a test case > that I committed to the TreeBASE source. > > For context, here is how TreeBASE sees the world: every submission to > TreeBASE consists of the results of one or more analyses. Each > analysis consists of one or more analysis steps. For each step, we > store the "algorithm" (e.g. neighbor joining) and the "software" (e.g. > PAUP). Optional additional metadata can consist of a textual > description of the algorithm, a version number and URL of the software > and a text string containing analysis step commands (perhaps something > like a PAUP block). > > Every analysis step has input and output data. These data can be trees > and matrices. The set of taxa in the input must be a superset of the > taxa in the output (i.e. some sort of taxon pruning is allowed, but > new taxa cannot be introduced during an analysis step). All data > that's accessible to third parties (i.e. all public, non-embargoed > data) must be the input or output of at least one analysis step, i.e. > we don't allow orphaned data in completed submissions. > > In the attached example, I'm annotating the study (i.e. the root of > the nexml document) to specify the permanent URLs of any associated > analyses, and I annotate those analysis URLs with their respective > analysis steps, specifying their PURLs and any additional metadata as > described above. This is shown in lines 3-13. > > Then, for every data object I specify for which analysis step(s) it is > the input and/or output (a data object can be both input and output if > analysis steps are chained together). This is shown in line 448 for a > character state matrix and line 1849 for a tree. > > This is all highly experimental but I figured I'd share at as a > discussion piece for refining actual implementation of MIAPA > annotations. > > Rutger > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading, RG6 6BX, United Kingdom > Tel: +44 (0) 118 378 7535 > http://rutgervos.blogspot.com > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2011-06-14 15:03:25
|
Interesting solution. MIAPA will definitely need to tackle analysis info, even if it's at a rudimentary level (like what produced what from what). TreeBASE used to only have analysis records -- the analysis step records was introduced with the idea that submitters might want the ability to describe more complex analysis in a multi-step fashion (e.g. took matrix x and produced set of trees y, took set of trees y and produced consensus tree z, etc). But alas, very few submitters have taken advantage of this -- most just have one step per analysis. Yet having multiple steps adds a slightly greater mouse-clicking burden. Maybe we should consider abandoning the multi-step design, and collapsing it down to single-step analysis entries? How might MIAPA come to an opinion on matters like this? bp On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote: > Hi all, > > over the weekend I did some experimentation with how additional > metadata having to do with phylogenetic analyses stored by TreeBASE > could be serialized. Attached is the result as produced by a test case > that I committed to the TreeBASE source. > > For context, here is how TreeBASE sees the world: every submission to > TreeBASE consists of the results of one or more analyses. Each > analysis consists of one or more analysis steps. For each step, we > store the "algorithm" (e.g. neighbor joining) and the "software" (e.g. > PAUP). Optional additional metadata can consist of a textual > description of the algorithm, a version number and URL of the software > and a text string containing analysis step commands (perhaps something > like a PAUP block). > > Every analysis step has input and output data. These data can be trees > and matrices. The set of taxa in the input must be a superset of the > taxa in the output (i.e. some sort of taxon pruning is allowed, but > new taxa cannot be introduced during an analysis step). All data > that's accessible to third parties (i.e. all public, non-embargoed > data) must be the input or output of at least one analysis step, i.e. > we don't allow orphaned data in completed submissions. > > In the attached example, I'm annotating the study (i.e. the root of > the nexml document) to specify the permanent URLs of any associated > analyses, and I annotate those analysis URLs with their respective > analysis steps, specifying their PURLs and any additional metadata as > described above. This is shown in lines 3-13. > > Then, for every data object I specify for which analysis step(s) it is > the input and/or output (a data object can be both input and output if > analysis steps are chained together). This is shown in line 448 for a > character state matrix and line 1849 for a tree. > > This is all highly experimental but I figured I'd share at as a > discussion piece for refining actual implementation of MIAPA > annotations. > > Rutger |
From: Rutger V. <R....@re...> - 2011-06-14 15:10:47
|
There isn't really a community process that decides what is or is not MIAPA, so at this stage these sorts of experiments are to inform people who are working on fleshing out the details of terms and concepts expressed in MIAPA. The person who's most involved in implementation details is Maryam - who I assumed was in the relevant mailing list but who's cc'ed here as well. On Tue, Jun 14, 2011 at 4:03 PM, William Piel <wil...@ya...> wrote: > > Interesting solution. MIAPA will definitely need to tackle analysis info, even if it's at a rudimentary level (like what produced what from what). TreeBASE used to only have analysis records -- the analysis step records was introduced with the idea that submitters might want the ability to describe more complex analysis in a multi-step fashion (e.g. took matrix x and produced set of trees y, took set of trees y and produced consensus tree z, etc). But alas, very few submitters have taken advantage of this -- most just have one step per analysis. Yet having multiple steps adds a slightly greater mouse-clicking burden. Maybe we should consider abandoning the multi-step design, and collapsing it down to single-step analysis entries? How might MIAPA come to an opinion on matters like this? > > bp > > On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote: > >> Hi all, >> >> over the weekend I did some experimentation with how additional >> metadata having to do with phylogenetic analyses stored by TreeBASE >> could be serialized. Attached is the result as produced by a test case >> that I committed to the TreeBASE source. >> >> For context, here is how TreeBASE sees the world: every submission to >> TreeBASE consists of the results of one or more analyses. Each >> analysis consists of one or more analysis steps. For each step, we >> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g. >> PAUP). Optional additional metadata can consist of a textual >> description of the algorithm, a version number and URL of the software >> and a text string containing analysis step commands (perhaps something >> like a PAUP block). >> >> Every analysis step has input and output data. These data can be trees >> and matrices. The set of taxa in the input must be a superset of the >> taxa in the output (i.e. some sort of taxon pruning is allowed, but >> new taxa cannot be introduced during an analysis step). All data >> that's accessible to third parties (i.e. all public, non-embargoed >> data) must be the input or output of at least one analysis step, i.e. >> we don't allow orphaned data in completed submissions. >> >> In the attached example, I'm annotating the study (i.e. the root of >> the nexml document) to specify the permanent URLs of any associated >> analyses, and I annotate those analysis URLs with their respective >> analysis steps, specifying their PURLs and any additional metadata as >> described above. This is shown in lines 3-13. >> >> Then, for every data object I specify for which analysis step(s) it is >> the input and/or output (a data object can be both input and output if >> analysis steps are chained together). This is shown in line 448 for a >> character state matrix and line 1849 for a tree. >> >> This is all highly experimental but I figured I'd share at as a >> discussion piece for refining actual implementation of MIAPA >> annotations. >> >> Rutger > > > -- > You received this message because you are subscribed to the Google > Groups "MIAPA" group. > For more options, visit this group at > http://groups.google.com/group/miapa-discuss?hl=en > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com |
From: Enrico P. <epo...@cs...> - 2011-06-14 16:52:10
|
I don¹t believe the two things are exclusive - a single step is just a "big step" that can be refined into a sequence of smaller steps if one wants to. I believe Maryam is currently looking at the "leaves" of this hierarchical tree that decomposes the analysis (by looking at the individual tools that can be used in the analysis). Enrico -- Dept. Computer Science, New Mexico State University MSC CS, Box 30001, Las Cruces, NM 88003 Voice: 575-646-6239 Fax: 575-646-1002 On 6/14/11 9:03 AM, "William Piel" <wil...@ya...> wrote: > >Interesting solution. MIAPA will definitely need to tackle analysis info, >even if it's at a rudimentary level (like what produced what from what). >TreeBASE used to only have analysis records -- the analysis step records >was introduced with the idea that submitters might want the ability to >describe more complex analysis in a multi-step fashion (e.g. took matrix >x and produced set of trees y, took set of trees y and produced consensus >tree z, etc). But alas, very few submitters have taken advantage of this >-- most just have one step per analysis. Yet having multiple steps adds a >slightly greater mouse-clicking burden. Maybe we should consider >abandoning the multi-step design, and collapsing it down to single-step >analysis entries? How might MIAPA come to an opinion on matters like this? > >bp > >On Jun 13, 2011, at 8:37 AM, Rutger Vos wrote: > >> Hi all, >> >> over the weekend I did some experimentation with how additional >> metadata having to do with phylogenetic analyses stored by TreeBASE >> could be serialized. Attached is the result as produced by a test case >> that I committed to the TreeBASE source. >> >> For context, here is how TreeBASE sees the world: every submission to >> TreeBASE consists of the results of one or more analyses. Each >> analysis consists of one or more analysis steps. For each step, we >> store the "algorithm" (e.g. neighbor joining) and the "software" (e.g. >> PAUP). Optional additional metadata can consist of a textual >> description of the algorithm, a version number and URL of the software >> and a text string containing analysis step commands (perhaps something >> like a PAUP block). >> >> Every analysis step has input and output data. These data can be trees >> and matrices. The set of taxa in the input must be a superset of the >> taxa in the output (i.e. some sort of taxon pruning is allowed, but >> new taxa cannot be introduced during an analysis step). All data >> that's accessible to third parties (i.e. all public, non-embargoed >> data) must be the input or output of at least one analysis step, i.e. >> we don't allow orphaned data in completed submissions. >> >> In the attached example, I'm annotating the study (i.e. the root of >> the nexml document) to specify the permanent URLs of any associated >> analyses, and I annotate those analysis URLs with their respective >> analysis steps, specifying their PURLs and any additional metadata as >> described above. This is shown in lines 3-13. >> >> Then, for every data object I specify for which analysis step(s) it is >> the input and/or output (a data object can be both input and output if >> analysis steps are chained together). This is shown in line 448 for a >> character state matrix and line 1849 for a tree. >> >> This is all highly experimental but I figured I'd share at as a >> discussion piece for refining actual implementation of MIAPA >> annotations. >> >> Rutger > > >-- >You received this message because you are subscribed to the Google >Groups "MIAPA" group. >For more options, visit this group at >http://groups.google.com/group/miapa-discuss?hl=en |