From: <sfr...@us...> - 2011-05-01 00:57:51
|
Revision: 825 http://treebase.svn.sourceforge.net/treebase/?rev=825&view=rev Author: sfrgpiel Date: 2011-05-01 00:57:43 +0000 (Sun, 01 May 2011) Log Message: ----------- Additional modifications to the data management plan text Modified Paths: -------------- trunk/treebase-web/src/main/webapp/WEB-INF/pages/dataMan.jsp trunk/treebase-web/src/main/webapp/common/sidebarLeft.jsp Modified: trunk/treebase-web/src/main/webapp/WEB-INF/pages/dataMan.jsp =================================================================== --- trunk/treebase-web/src/main/webapp/WEB-INF/pages/dataMan.jsp 2011-04-30 22:03:40 UTC (rev 824) +++ trunk/treebase-web/src/main/webapp/WEB-INF/pages/dataMan.jsp 2011-05-01 00:57:43 UTC (rev 825) @@ -4,26 +4,33 @@ <div class="gutter"> <h1>NSF Data Management Plan</h1> -<p>To foster the sharing and dissemination of data produced by sponsored research, the National Science Foundation requires a data management plan for all proposals. At a minimum, these data consist of whatever is necessary to validate research findings by the scientific community, which includes (1) analyzed data, and metadata that (2) provide provenance and (3) define how the data were generated. </p> -<h2>Data Ingest and Storage</h2> -<p>The three kinds of data required by NSF are accepted by TreeBASE, whether submitted directly to TreeBASE or indirectly by way of <a href="http://datadryad.org/" target="_blank">Dryad</a>. For (1), we accept NEXUS character block data with datatypes standard, continuous, DNA, RNA, and protein, and non-reticulating phylogenetic trees with branch lengths and clade support values. For (2) we parse and store character labels and state labels in submitted NEXUS files and we map taxon labels to NCBI and uBio external taxonomies. Additionally, we accept the following metadata: museum specimen numbers in accordance with the Registry of Biological Repositories (<a href="http://www.biorepositories.org" target="_blank">RBR</a>), Genbank accession numbers, other accession numbers, and Darwin Core compatible specimen metadata: collecting date, collector, latitude/longitude, elevation, country, state, and locality. For (3), we store and share the original uploaded NEXUS files (including any program-specific command blocks that can define substitution models and search parameters) as well as offer metadata files in analysis description records for annotating software, algorithm, and commands used. TreeBASE only shares data that are linked to a manuscript that is accepted by a peer reviewed publication (e.g. journal article, reviewed book or book section, academic thesis accepted by a thesis committee, etc). </p> +<p>To foster the sharing and dissemination of data produced by sponsored research, the National Science Foundation requires a data management plan for all proposals. The kinds of data that must be shared generally include whatever the scientific community needs to validate research findings. In particular, researchers must present a plan to share (1) analyzed data, (2) metadata that provide provenance information, and (3) metadata that describe how the data were generated. </p> +<h2>Data Submission and Storage</h2> +<p>For phylogenetics, the three kinds of data mentioned above and required by NSF are all accepted by TreeBASE, whether submitted directly to TreeBASE or indirectly by way of <a href="http://datadryad.org/" target="_blank">Dryad</a>. For data type (1), we accept NEXUS formatted data with characters of datatype standard, continuous, DNA, RNA, and protein, and non-reticulating phylogenetic trees with branch lengths and clade support values. For metadata type (2) we parse and store morphological character labels and state labels in submitted NEXUS files and we map taxon labels to NCBI and uBio external taxonomies. Additionally, we accept the following metadata: museum specimen numbers in accordance with the Registry of Biological Repositories (<a href="http://www.biorepositories.org" target="_blank">RBR</a>), Genbank accession numbers, other accession numbers, and Darwin Core compatible specimen metadata: collecting date, collector, latitude/longitude, elevation, country, state, and locality. For metadata type (3), we store and share the original uploaded NEXUS files (including any program-specific command blocks that can define substitution models and search parameters) as well as provide data entry fields to describe software, algorithm, and commands used. TreeBASE only shares data that are linked to a manuscript that is accepted by a peer reviewed publication (e.g. journal article, reviewed book or book section, or academic thesis approved by a thesis committee). </p> <h2>Data Integrity and Verification</h2> -<p>TreeBASE helps to certify data integrity by: (A) only accepting NEXUS data that are successfully parsed by a server-side headless version of <a href="http://mesquiteproject.org/" target="_blank">Mesquite</a>, (B) verifying that taxon labels in matrices and relevant trees are consistent, (C) verifying that data objects are not 'orphaned' (i.e. unlinked to an analysis), and (D) verifying that taxon labels are recognizable by biologists, spelled correctly, and mapped to external taxonomies whenever possible. TreeBASE provides a special advanced access URL for anonymous reviewers and referees to provide additional quality control. Although additional NSF requirements relating to provenance and how data were generated are not normally required or scrutinized by TreeBASE, submitters who flag their submission as NSF-sponsored data will receive special attention by our staff. In these cases, TreeBASE staff will check to make sure that provenance and analysis metadata are adequately provided, and, as needed, communicate with the submitter and assist in properly formatting and ingesting these data. </p> +<p>TreeBASE helps to certify data integrity by: </p> +<ul> + <li>Validating submitted NEXUS files by parsing them with <a href="http://mesquiteproject.org/" target="_blank">Mesquite</a> on the TreeBASE server</li> + <li>Verifying that taxon labels in matrices and relevant trees are consistent</li> + <li>Verifying that data objects are not 'orphaned' (i.e. unlinked to an analysis)</li> + <li>Verifying that taxon labels are recognizable by biologists, spelled correctly, and mapped to external taxonomies whenever possible</li> +</ul> +<p>TreeBASE provides an advanced access URL for anonymous reviewers and referees to provide additional quality control before the data are made public. Although additional NSF requirements relating to provenance and how data were generated are not normally required or scrutinized by TreeBASE staff, submitters who flag their submission (in the submission notes section) as NSF-sponsored data will receive special attention by our staff. In these cases, TreeBASE staff will check to make sure that provenance and analysis metadata are adequately provided, and, as needed, communicate with the submitter and assist in properly formatting and ingesting these data. </p> <h2>Data Standards and Dissemination</h2> -<p>TreeBASE plans to remain in compliance with the emerging, but still evolving, standard of Minimal Information for a Phylogenetic Analysis (<a href="http://www.nescent.org/sites/evoio/MIAPA" target="_blank">MIAPA</a>). In addition, TreeBASE publishes persistant and resolvable globally unique identifiers for all major data objects and disseminates data and metadata using commonly accepted standards. A Restful <a href="http://www.nescent.org/wg/evoinfo/index.php?title=PhyloWS" target="_blank">PhyloWS</a> API exposes metadata using RSS feeds in RDF; a <a href="http://www.nexml.org/" target="_blank">NeXML</a> serialization exposes data marked up with metadata using published vocabularies and fully qualified URIs in compliance with <a href="http://linkeddata.org/" target="_blank">Linked Data</a> standards. Basic record metadata are published through an OIA-PMH service and records are mirrored by Dryad, which provides a secondary Dryad <a href="http://www.datacite.org" target="_blank">DataCite DOI</a>. However, for most people in the scientific community, data will be retrieved using the web user interface and downloaded in the NEXUS format, and metadata will be downloaded in tab-separated text format. </p> +<p>TreeBASE plans to remain in compliance with the emerging, but still evolving, standard of Minimal Information for a Phylogenetic Analysis (<a href="http://www.nescent.org/sites/evoio/MIAPA" target="_blank">MIAPA</a>). In addition, TreeBASE publishes persistant and resolvable globally unique identifiers (GUIDs) for all major data objects and disseminates data and metadata using commonly accepted standards. A Restful <a href="http://www.nescent.org/wg/evoinfo/index.php?title=PhyloWS" target="_blank">PhyloWS</a> API exposes metadata using RSS feeds in RDF; a <a href="http://www.nexml.org/" target="_blank">NeXML</a> serialization exposes data marked up with metadata using published vocabularies and fully qualified URIs in compliance with <a href="http://linkeddata.org/" target="_blank">Linked Data</a> standards. Basic record metadata are published through an OIA-PMH service, and TreeBASE records are mirrored by Dryad, which provides a secondary Dryad <a href="http://www.datacite.org" target="_blank">DataCite DOI</a>. However, for most people in the scientific community, data will be retrieved using the web user interface and downloaded in the NEXUS format, while metadata can be downloaded separately in a tab-separated text format. </p> <h2>Data Persistance</h2> -<p>Although no data service can guarantee indefinite persistance, TreeBASE will make every effort to preserve it services as long as possible. Additionally, the Articles of Incorporation of the <a href="http://www.phylofoundation.org" target="_blank">Phyloinformatics Research Foundation</a>, which oversees TreeBASE activities, specifies that if dissolution is ever required the assets will be transferred to a similar entity with a comparable mission. </p> +<p>Although no data service can guarantee indefinite persistance, TreeBASE will make every effort to preserve its services as long as possible. Additionally, the Articles of Incorporation of the <a href="http://www.phylofoundation.org" target="_blank">Phyloinformatics Research Foundation</a>, which oversees TreeBASE activities, specify that if dissolution is ever required the assets will be transferred to a similar entity with a comparable mission. </p> <h2>Preparing a Data Management Plan for NSF</h2> -<p>Scientists are welcome to designate TreeBASE as their selected repository and dissemination service for phylogenetic data generated by sponsored research. In this document, the following should be mentioned:</p> +<p>Scientists are welcome to designate TreeBASE as their selected repository and dissemination service for phylogenetic data generated by sponsored research. In their Data Management Plan, we suggest that the following be mentioned:</p> <ul> - <li>Specify the name(s) of the person(s) responsible for preparing the data matrices, trees, and metadata for submission to TreeBASE.</li> - <li>Identify the kinds of data that will be submitted, including provenance and analysis metadata as outlined above. For metadata not accepted by TreeBASE (e.g. digital images of specimens), identify other repositories where these will be stored (e.g. <a href="http://www.morphbank.net/" target="_blank">Morphbank</a> or <a href="http://www.morphobank.org/" target="_blank">Morphobank</a>), and indicate how these data objects will be linked between TreeBASE data and the other repository (e.g. using shared specimen catalog numbers). </li> - <li>For your phylogenetic data, you can report that your data will be serialized using TreeBASE's data formats: NEXUS, for character and tree data alone, and NeXML for these data plus marked up with supplied metadata (e.g. basic Dublin Core publication data, basic Darwin Core specimen information, RBR collection codes and catalog numbers, uBio and NCBI taxon identification numbers, and Genbank accession numbers).</li> - <li>Provide an overview of access and sharing. For your TreeBASE-submitted data, you can state that TreeBASE makes all data and metadata freely available to the public once the manuscript under review has been accepted by a peer-reviewed publisher. TreeBASE will allow for data embargo periods according to the policies of the journal, but once data are public they are assumed to be released to the public domain without any restrictions on reuse. We recommend that you state that you will provide TreeBASE's resolvable GUIDs for your deposited data in future progress reports to NSF, in relevant publications, and in your lab's web page. </li> - <li>State that you will flag your submissions to TreeBASE as data subject to your data management plan so as to receive special attention by TreeBASE staff to help ensure that the data are richly annotated and fully compliant for maximum reuse in accordance with community standards in phylogenetics. </li> + <li>Specify the name(s) of person(s) responsible for preparing the data matrices, trees, and metadata for submission to TreeBASE.</li> + <li>Identify the kinds of data that will be submitted, including provenance and analysis metadata as outlined above. For metadata not accepted by TreeBASE (e.g. digital images of specimens), identify other repositories where these will be stored (e.g. <a href="http://www.morphbank.net/" target="_blank">Morphbank</a> or <a href="http://www.morphobank.org/" target="_blank">Morphobank</a>), and indicate how entities in TreeBASE and in the other repository will be linked (e.g. using shared specimen catalog numbers). </li> + <li>For data standards, you can state that your data will be serialized using TreeBASE's data formats: NEXUS, for character and tree data alone, and NeXML for character and tree data with additional metadata (e.g. basic Dublin Core publication data, Darwin Core specimen information, RBR collection codes and catalog numbers, uBio and NCBI taxon identification numbers, and Genbank accession numbers).</li> + <li>Provide an overview of access and sharing. For your TreeBASE-submitted data, you can state that TreeBASE makes all data and metadata freely available to the public once the manuscript under review has been accepted by a peer-review publication. TreeBASE will allow data embargo periods according to the policies of the journal, but once data are public they are assumed to be released to the public domain without any restrictions on reuse. We recommend that you state that you will provide TreeBASE's resolvable globally unique identifiers (GUIDs) for your deposited data in future progress reports to NSF, in relevant publications, and in your lab's web page. </li> + <li>State that you will flag your submissions to TreeBASE as data subject to your data management plan so as to receive special attention by TreeBASE staff to help ensure that the data are richly annotated and fully compliant for reuse in accordance with community standards in phylogenetics. </li> </ul> -<p>TreeBASE suggests that for each submission of data from sponsored research you contribute at least $50 towards defraying the costs of storage and dissemination, as well as in support of the additional scrutiny by TreeBASE staff for NSF data management compliance. This fee is collected by the Phyloinformatics Research Foundation, which overseas TreeBASE activities. Anticipated cost can be budgeted under publication expenses on your grant proposal's budget. </p> -<p><hr /></p> +<p>TreeBASE suggests that for each submission of data from sponsored research you contribute at least $100 towards defraying the costs of storage and dissemination, as well as in support of the additional scrutiny by TreeBASE staff for NSF data management compliance. This fee is collected by the Phyloinformatics Research Foundation, which overseas TreeBASE activities. Anticipated costs can be budgeted under publication expenses in your grant proposal's budget. </p> +<hr /></p> <table width="100%" border="0"> <tr> <td width="50%" valign="top">Data storage contribution for sponsored research:</td> @@ -48,4 +55,5 @@ </tr> </table> + </div> \ No newline at end of file Modified: trunk/treebase-web/src/main/webapp/common/sidebarLeft.jsp =================================================================== --- trunk/treebase-web/src/main/webapp/common/sidebarLeft.jsp 2011-04-30 22:03:40 UTC (rev 824) +++ trunk/treebase-web/src/main/webapp/common/sidebarLeft.jsp 2011-05-01 00:57:43 UTC (rev 825) @@ -14,6 +14,7 @@ <li><a href="<c:url value="/reference.html"/>"><fmt:message key="nav.references"/></a></li> </ul> </li> + <li><a href="<c:url value="/dataMan.html"/>"><fmt:message key="nav.dataman"/></a></li> <li><a href="<c:url value="/urlAPI.html"/>"><fmt:message key="nav.dataaccess"/></a></li> <li><a href="<c:url value="/journal.html"/>"><fmt:message key="nav.journals"/></a></li> <li><a href="<c:url value="/contact.html"/>"><fmt:message key="nav.contact"/></a></li> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |