From: Rutger V. <rut...@gm...> - 2012-02-06 10:54:40
|
Hi all, I have done some refactorings on the NeXML generation this weekend, which also involved some changes to the nexml.jar. The upshot is two-fold: * generation of character state matrices should go faster now, though that recent study with an nchar ~40k amino acids still hangs things. Perhaps we need to make nexml output more modular, such that in the entire study dump we don't put the full matrices, but instead an xinclude directive that points to the matrix's purl? * row objects can now be annotated in a way that fits more with our data model where different segments can be annotated separately. What we can now do is attach an annotation to a row, and then annotate that annotation. So you'd get on a row multiple annotations "tb:rowSegment" and each of these segments is in turn annotated with its start and end index and whichever other metadata the submitter has supplied (e.g. georeferencing and GenBank accession numbers). Here is a recent example that demonstrates this: http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=12335&format=nexml There's still a couple of little commits that need to happen (among which are: TreebaseIDString needs to store a prefix for MatrixRow objects so that we can attach the ID to the <row> element) before I would suggest pushing this to production, by the way. Cheers, Rutger -- Dr. Rutger A. Vos Bioinformaticist NCB Naturalis Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands http://rutgervos.blogspot.com |