From: Rutger V. <rut...@gm...> - 2011-06-06 14:21:09
|
On Thu, Jun 2, 2011 at 9:17 PM, Laurel Yohe <lol...@gm...> wrote: > SoI have been working a little more on the unit test and am getting a little > closer I believe. I might just be a little confused as to what is what. > Maybe if you can answer some questions it would help me out a little bit: > 1) Can you clarify the purpose of the NexmlDocumentConverter() Class? A NeXML document is a document in NeXML syntax. The document contains taxa, character state matrices and trees. TreeBASE submissions also contain taxa, character state matrices and trees, in its own object representation. NexmlDocumentConverter converts TreeBASE submissions into NeXML documents and back. > 2) What is the purpose of the setNexmlProject(Document) function? In order to guarantee that a NeXML document is coherent in the way taxa, matrices and trees refer to each other these objects all need to be instantiated from the document object. For example, to create a NeXML character state matrix you have to do: CategoricalMatrix xmlMatrix = document.createCategoricalMatrix(xmlOTUs); This way it is clear which set of taxa the matrix refers to, and which document it belongs to. Hence, any of the subclasses of the document converter (such as NexmlMatrixConverter) need to have access to the same document when they are converting the respective TreeBASE objects they operate on. They do this by calling getDocument(), which will return the document that was provided to setNexmlProject(Document) at the start of the conversion process. > 3) In the NexmlMatrixConverter(), there is the same method for different > types of Matrices--i.e. CategoricalMatrix, StandardMatrix, CharacterMatrix, > etc. It seems that I want to focus on testing public > org.nexml.model.Matrix<?> fromTreeBaseToXml(CharacterMatrix tbMatrix) > because it calls the populateXmlMatrix() function. Within this function > though it checks if the CharacterMatrix is an instance of other types of > matrices and I am just getting a little confused. What makes one matrix > different from the other/why are there so many different types of matrices? There are different types of TreeBASE character matrices because they contain different data types which have different mappings onto the database and have different optimizations. For example, a matrix with continuous-valued character states is stored differently in the database than a matrix with DNA sequences, because DNA sequences are typically long (many characters) and therefore costly to reconstitute if each cell in the matrix is stored as an individual database record. On the other hand, continuous matrices are usually smaller, and relational databases have optimizations to store numbers of different types (longs, doubles, etc.). For NeXML there are similar reasons for needing to distinguish between different types of matrices. For example, NeXML is designed to be explicit in enumerating all possible character states that occur in a matrix ahead of the actual matrix itself (in the <format/> element) - but for continuous matrices you can't enumerate all possible states, because that would be an infinite number. The converter needs to deal with these separate cases. > 4)In my test, I am creating a NexmlMatrixConverter object by using a test > studyId "1787". (It was a studyId that a different unit test used). When I > try to "getMatrices()" it comes up as null. I don't know if this is because > there are no matrices associated with it, if it is not of type > CharacterMatrix so it won't work, or something else. It is something else. What we are trying to do is convert TreeBASE objects to XML. For this we use the method fromTreeBaseToXml. You used fromXmlToTreeBase, which is intended to convert NeXML objects into TreeBASE objects. I've corrected your NexmlMatrixConverter class to demonstrate how this is supposed to work. > If it the first case, > is there a different test study that I could try that you know works? Is > there an issue with getting matrices from a Study? I saw that getStudy() > method and setStudy() method were commented out for RowSegment() class so I > didn't know if that might be affecting things. > I think those are my four major classes. I committed the code none the less. > I also added a few lines of code in the NexmlMatrixConverter class to add > more annotations. However, I am not sure if I have the adding of the > annotation in the right format. For example: > > String accessionNumber = tbSegment.getSpecimenLabel().getGenBankAccession(); > > if ( null != accessionNumber) { > > xmlOTU.addAnnotationValue("DwC:StringGenbankAcessionNumber", > Constants.DwCURI, accessionNumber); > > } > > I was making my best educated guess. Alright that is all for now. In the > meantime, I will break from this and work a little on the charsets issue. That was a pretty good guess. Looking forward to your charsets results. > Hope you are having a good week! Thank you, you too! Rutger -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |