[Treebase-devel] refactorings

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I have done some refactorings on the NeXML generation this weekend, which
also involved some changes to the nexml.jar. The upshot is two-fold:

* generation of character state matrices should go faster now, though that
recent study with an nchar ~40k amino acids still hangs things. Perhaps we
need to make nexml output more modular, such that in the entire study dump
we don't put the full matrices, but instead an xinclude directive that
points to the matrix's purl?

* row objects can now be annotated in a way that fits more with our data
model where different segments can be annotated separately. What we can now
do is attach an annotation to a row, and then annotate that annotation. So
you'd get on a row multiple annotations "tb:rowSegment" and each of these
segments is in turn annotated with its start and end index and whichever
other metadata the submitter has supplied (e.g. georeferencing and GenBank
accession numbers).

Here is a recent example that demonstrates this:
http://treebase-dev.nescent.org/treebase-web/search/downloadAStudy.html?id=12335&format=nexml

There's still a couple of little commits that need to happen (among which
are: TreebaseIDString needs to store a prefix for MatrixRow objects so that
we can attach the ID to the <row> element) before I would suggest pushing
this to production, by the way.

Cheers,

Rutger

-- 
Dr. Rutger A. Vos
Bioinformaticist
NCB Naturalis
Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com