From: William P. <wil...@ya...> - 2011-05-16 17:04:32
|
Dear Mesquite: TreeBASE currently offers the ability to store valuable metadata annotations to sections of rows in the MATRIX of a CHARACTERS block. This is like a row-specific CHARSET range. So, for example, if a CHARACTERS block consists of, say, five concatenated genes, then for each row we can store five Genbank accession numbers, each with its own "begin" and "end" character number. A piece of annotation linked to a "begin" and "end" section of a row of characters can have the following tags: Title (free text) GenBank Accession Number Other Accession Num (free text, e.g. culture number, etc) Sample Taxon Label (i.e., if the taxon used to sequence part of the matrix row is different from the taxon label for the entire row and tree OTU) Basic Darwin Core and biorepositories.org museum specimen triple: Inst. Acronym; Collection Code; Catalog Number Basic Darwin Core specimen info: Collector; Sample Date; Country; State; Locality; Latitude-Longitude; Elevation Notes (free text) Aside from NeXML, these annotations are downloaded via a tab-separated text file. For example, to obtain the annotations for this dataset, you need to use this link. My question is, is there a way for us to download the NEXUS so that these annotations are readable/viewable by Mesquite? The logical place is in the NOTES block, and I notice that you offer this syntax: SUTM T = 4 N = genBankNumber S = AF284000; ... but I think this obliges the Genbank number to (1) apply to the entire row in the matrix, and (2) to be attached to the TAXA block instead of to the CHARACTERS block. We frequently have instances where several CHARACTERS blocks hang off of the same TAXA block, as well as instances where a single row of a MATRIX has several Genbank accession numbers, each associated with different subsections of sequence. An alternative is to put something like this: AN T = 4 C = 1 AU = TreeBASE TF = ( CM AF284000 ) TF = ( R genBankNumber ); But that implies that the Genbank number only attaches to the first character in the row (instead of a larger chunk of the row, if not the entirety of it), so is not ideal. Can you recommend a syntax for attaching annotations to a range of characters of a particular row? Or do you have plans to offer this functionality? regards, Bill |