From: William P. <wil...@ya...> - 2014-01-14 01:18:24
|
In an ideal world, submitters would upload matrices and trees in the same Mesquite project, all sharing the same taxon block. This would prevent the proliferation of redundant taxon sets. In reality, matrices and trees are often uploaded separately, and so long as the set of taxon labels in the trees are an equal or smaller subset of the set in the matrices, the trees can be linked to the matrices by way of an analysis (a rule that is, regrettably, incompatible with *BEAST or Mesquite's taxon-association table). When users click the "download submission as NEXUS" any redundant taxon sets are consolidated into single taxon blocks that become shared by compatible trees and matrices. This does not happen when users click the "download submission as NeXML." Some day it would be nice to express the analysis in NEXML -- i.e. the relationship among matrices and trees, such as which is a derivative of which, along with analysis metadata. Given TreeBASE's rule that all analysis objects share the same taxon set, the analysis records could form the basis for shared taxon label elements. (course, this is all very kluge-like; ideally we would simply rely on a more direct object-element mapping). bp On Jan 13, 2014, at 8:53 PM, Rutger Vos <rut...@gm...> wrote: > For posterity and in the interest of completeness: > > When TreeBASE reads such a NEXUS file with namespaced taxon labels (TaxonLabel) it will scope the strings to the study, and will attempt to give them normalized references to at least the level of a commonly-occurring "variant" of a name (TaxonVariant), and ideally (most commonly) a known, true Taxon. > > Subsequently, references to these TaxonLabel objects are collected into multiple TaxonLabelSet bins, which represent the encountered taxa blocks. > > When NeXML is generated, the generic (not entirely thought-through) approach has been to generate all XML identifiers from the primary keys in the database. This might fail to generate unique XML IDs for otu elements in studies where the same TaxonLabel occurs in multiple TaxonLabelSet bins (which are serialized to multiple "otus" elements). > > A possible fix would be to generate XML IDs for otu elements by combining the TaxonLabel primary key with the TaxonLabelSet PK. Unfortunately, this will be a bit messy to get right because the ID references (from tree node to otu, from sequence to otu) need to generate the correct string as well. |