Re: [Treebase-devel] couple questions regarding JUnit test

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Jun 2, 2011 at 9:17 PM, Laurel Yohe <lol...@gm...> wrote:
> SoI have been working a little more on the unit test and am getting a little
> closer I believe. I might just be a little confused as to what is what.
> Maybe if you can answer some questions it would help me out a little bit:
> 1) Can you clarify the purpose of the NexmlDocumentConverter() Class?

A NeXML document is a document in NeXML syntax. The document contains
taxa, character state matrices and trees.

TreeBASE submissions also contain taxa, character state matrices and
trees, in its own object representation.

NexmlDocumentConverter converts TreeBASE submissions into NeXML
documents and back.

> 2) What is the purpose of the setNexmlProject(Document) function?

In order to guarantee that a NeXML document is coherent in the way
taxa, matrices and trees refer to each other these objects all need to
be instantiated from the document object. For example, to create a
NeXML character state matrix you have to do:

CategoricalMatrix xmlMatrix = document.createCategoricalMatrix(xmlOTUs);

This way it is clear which set of taxa the matrix refers to, and which
document it belongs to. Hence, any of the subclasses of the document
converter (such as NexmlMatrixConverter) need to have access to the
same document when they are converting the respective TreeBASE objects
they operate on. They do this by calling getDocument(), which will
return the document that was provided to setNexmlProject(Document) at
the start of the conversion process.

> 3) In the NexmlMatrixConverter(), there is the same method for different
> types of Matrices--i.e. CategoricalMatrix, StandardMatrix, CharacterMatrix,
> etc. It seems that I want to focus on testing public
> org.nexml.model.Matrix<?> fromTreeBaseToXml(CharacterMatrix tbMatrix)
> because it calls the populateXmlMatrix() function. Within this function
> though it checks if the CharacterMatrix is an instance of other types of
> matrices and I am just getting a little confused. What makes one matrix
> different from the other/why are there so many different types of matrices?

There are different types of TreeBASE character matrices because they
contain different data types which have different mappings onto the
database and have different optimizations. For example, a matrix with
continuous-valued character states is stored differently in the
database than a matrix with DNA sequences, because DNA sequences are
typically long (many characters) and therefore costly to reconstitute
if each cell in the matrix is stored as an individual database record.
On the other hand, continuous matrices are usually smaller, and
relational databases have optimizations to store numbers of different
types (longs, doubles, etc.).

For NeXML there are similar reasons for needing to distinguish between
different types of matrices. For example, NeXML is designed to be
explicit in enumerating all possible character states that occur in a
matrix ahead of the actual matrix itself (in the <format/> element) -
but for continuous matrices you can't enumerate all possible states,
because that would be an infinite number.

The converter needs to deal with these separate cases.

> 4)In my test, I am creating a NexmlMatrixConverter object by using a test
> studyId "1787". (It was a studyId that a different unit test used). When I
> try to "getMatrices()" it comes up as null. I don't know if this is because
> there are no matrices associated with it, if it is not of type
> CharacterMatrix so it won't work, or something else.

It is something else. What we are trying to do is convert TreeBASE
objects to XML. For this we use the method fromTreeBaseToXml.

You used fromXmlToTreeBase, which is intended to convert NeXML objects
into TreeBASE objects. I've corrected your NexmlMatrixConverter class
to demonstrate how this is supposed to work.

> If it the first case,
> is there a different test study that I could try that you know works? Is
> there an issue with getting matrices from a Study? I saw that getStudy()
> method and setStudy() method were commented out for RowSegment() class so I
> didn't know if that might be affecting things.
> I think those are my four major classes. I committed the code none the less.
> I also added a few lines of code in the NexmlMatrixConverter class to add
> more annotations. However, I am not sure if I have the adding of the
> annotation in the right format. For example:
>
> String accessionNumber = tbSegment.getSpecimenLabel().getGenBankAccession();
>
> if ( null != accessionNumber) {
>
> xmlOTU.addAnnotationValue("DwC:StringGenbankAcessionNumber",
> Constants.DwCURI, accessionNumber);
>
> }
>
> I was making my best educated guess. Alright that is all for now. In the
> meantime, I will break from this and work a little on the charsets issue.

That was a pretty good guess. Looking forward to your charsets results.

> Hope you are having a good week!

Thank you, you too!

Rutger

-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com