From: Mark H. <mth...@gm...> - 2014-01-02 20:15:46
|
Hi, The example is the ID Tl241048 in http://treebase.org/treebase-web/search/downloadAStudy.html?id=10000&format=nexml The nexus output has multiple taxa blocks and the taxon name Cardamine_alpina_DAE2_R325 (which is associated with the offending otu elements in the nexml form) occurs in multiple taxa blocks. In the (mesquitized) form of NEXUS that allows multiple TAXA blocks, the block TITLE acts like a namespace. So identical taxa labels are not a problem in NEXUS. But in nexml spec ids are supposed to be unique throughout the file. I suspect this difference in approaches to dealing with disambiguation may be the source of the bug. all the best, Mark -- Mark Holder mth...@gm... mth...@ku... http://phylo.bio.ku.edu/mark-holder ============================================== Department of Ecology and Evolutionary Biology University of Kansas 6031 Haworth Hall 1200 Sunnyside Avenue Lawrence, Kansas 66045 lab phone: 785.864.5789 fax (shared): 785.864.5860 ============================================== |
From: Rutger V. <rut...@gm...> - 2014-01-13 12:54:01
|
For posterity and in the interest of completeness: When TreeBASE reads such a NEXUS file with namespaced taxon labels (TaxonLabel) it will scope the strings to the study, and will attempt to give them normalized references to at least the level of a commonly-occurring "variant" of a name (TaxonVariant), and ideally (most commonly) a known, true Taxon. Subsequently, references to these TaxonLabel objects are collected into multiple TaxonLabelSet bins, which represent the encountered taxa blocks. When NeXML is generated, the generic (not entirely thought-through) approach has been to generate all XML identifiers from the primary keys in the database. This might fail to generate unique XML IDs for otu elements in studies where the same TaxonLabel occurs in multiple TaxonLabelSet bins (which are serialized to multiple "otus" elements). A possible fix would be to generate XML IDs for otu elements by combining the TaxonLabel primary key with the TaxonLabelSet PK. Unfortunately, this will be a bit messy to get right because the ID references (from tree node to otu, from sequence to otu) need to generate the correct string as well. On Thu, Jan 2, 2014 at 9:15 PM, Mark Holder <mth...@gm...> wrote: > Hi, > The example is the ID Tl241048 in > > http://treebase.org/treebase-web/search/downloadAStudy.html?id=10000&format=nexml > > The nexus output has multiple taxa blocks and the taxon name > Cardamine_alpina_DAE2_R325 (which is associated with the offending otu > elements in the nexml form) occurs in multiple taxa blocks. In the > (mesquitized) form of NEXUS that allows multiple TAXA blocks, the > block TITLE acts like a namespace. So identical taxa labels are not a > problem in NEXUS. But in nexml spec ids are supposed to be unique > throughout the file. > > I suspect this difference in approaches to dealing with disambiguation > may be the source of the bug. > > > all the best, > Mark > > -- > Mark Holder > > mth...@gm... > mth...@ku... > http://phylo.bio.ku.edu/mark-holder > > ============================================== > Department of Ecology and Evolutionary Biology > University of Kansas > 6031 Haworth Hall > 1200 Sunnyside Avenue > Lawrence, Kansas 66045 > > lab phone: 785.864.5789 > fax (shared): 785.864.5860 > ============================================== > > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics > Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Bioinformaticist Naturalis Biodiversity Center Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2014-01-14 01:18:24
|
In an ideal world, submitters would upload matrices and trees in the same Mesquite project, all sharing the same taxon block. This would prevent the proliferation of redundant taxon sets. In reality, matrices and trees are often uploaded separately, and so long as the set of taxon labels in the trees are an equal or smaller subset of the set in the matrices, the trees can be linked to the matrices by way of an analysis (a rule that is, regrettably, incompatible with *BEAST or Mesquite's taxon-association table). When users click the "download submission as NEXUS" any redundant taxon sets are consolidated into single taxon blocks that become shared by compatible trees and matrices. This does not happen when users click the "download submission as NeXML." Some day it would be nice to express the analysis in NEXML -- i.e. the relationship among matrices and trees, such as which is a derivative of which, along with analysis metadata. Given TreeBASE's rule that all analysis objects share the same taxon set, the analysis records could form the basis for shared taxon label elements. (course, this is all very kluge-like; ideally we would simply rely on a more direct object-element mapping). bp On Jan 13, 2014, at 8:53 PM, Rutger Vos <rut...@gm...> wrote: > For posterity and in the interest of completeness: > > When TreeBASE reads such a NEXUS file with namespaced taxon labels (TaxonLabel) it will scope the strings to the study, and will attempt to give them normalized references to at least the level of a commonly-occurring "variant" of a name (TaxonVariant), and ideally (most commonly) a known, true Taxon. > > Subsequently, references to these TaxonLabel objects are collected into multiple TaxonLabelSet bins, which represent the encountered taxa blocks. > > When NeXML is generated, the generic (not entirely thought-through) approach has been to generate all XML identifiers from the primary keys in the database. This might fail to generate unique XML IDs for otu elements in studies where the same TaxonLabel occurs in multiple TaxonLabelSet bins (which are serialized to multiple "otus" elements). > > A possible fix would be to generate XML IDs for otu elements by combining the TaxonLabel primary key with the TaxonLabelSet PK. Unfortunately, this will be a bit messy to get right because the ID references (from tree node to otu, from sequence to otu) need to generate the correct string as well. |