From: Michael E. <mi...@sf...> - 2010-05-26 22:37:41
|
Re. metadata for tree elements I'm just tossing out ideas here so bear with me if this is a bit wacky. As most of you know I'm quite new to the library and its applications but I'll get to grips with it sooner or later. My understanding is that trees are stored as newick strings within NCL. In order to annotate nodes/edges I see two possible approaches. First, we can modify the newick string representation internally to provide ids for each node and edge in a tree. I envisage using something similar to the approach used in Paup, in which '$' is used as an edge identifier. Let's say we use '#' as a node identifier and '$' as an edge identifier in the same way that ':' is used as an edgeLength identifier. So our internal newick representation goes from: ((A:1,B:1)AB:1,C:2)R:0; to: ((A#n1$e1:1,B#n2$e2:1)AB#n3$e3:1,C#n4$e4:2)R#n5$e5:0; Internally, we could then associate metadata in a hash table based on the ids in the tree. This has the obvious drawback that client code must know the node and edge ids somehow in order to fetch the metadata. One can image having something like: std::vector<NxsString> myNodeIds = treeBlock->getNodeIds(unsigned int treeId); Not terribly intuitive to map node ids onto nodes, though. I think a better solution is simply to represent trees as actual trees accompanied by preorder and postorder iterators, in memory. At face value this would be a much more radical redesign of the library. But not necessarily. What we could do is use a modified internal tree representation like "((A#n1$e1:1,B#n2$e2:1)AB#n3$e3:1,C#n4$e4:2)R#n5$e5:0;" and store a hash table of metadata as described above; but simply generate a true tree data structure on the fly from the modified newick representation when some specific tree metadata is requested. So the tree storage would be a set of modified newick strings. But the newick string would be used to generate a true tree data structure using client code like: TreeStructure * struct = myTreeBlock->getTreeStructure(unsigned int id); // <- a tree structure is generated on the fly from the stored newick string TreeStructure::iterator it = struct->preorder_begin(); while (it != struct->preorder_end()) { NxsString nodeMetadata = *it; // NxsString-formatted metadata is looked up in a pre-stored hash table NxsString nodeId = it.id()/ // do something with the metadata.... ++it; } Just tossing out ideas here. Storing trees as strings and using them to generate tree data structures on the fly is quite an efficient way of doing things in terms of memory use and speed (for example I have a very simple tree class with pre and postorder iterators that can generate the complete ncbi taxonomy from an in-memory newick string in a few milliseconds). As for the strong typing of c++ - metadata could be stored as a sequence of "boost::any" objects and we could have client code request each metadata item in the format it can handle, otherwise ignore the metadata if it cannot be converted to the requested type. I think it really is up to client code to know what types of data it can handle. Mick ----- Original Message ----- From: "Jeet Sukumaran" <je...@ku...> Cc: "ncl-devel" <ncl...@li...> Sent: Tuesday, May 25, 2010 9:18:54 PM GMT -08:00 US/Canada Pacific Subject: Re: [Ncl-devel] annotation API for NCL I read this briefly earlier this morning (yes, "morning", thanks to lingering jet lag + baby feeding times, I have been experiencing this phenomenon quite a bit recently), but now the link seems dead ... Anyway, I think the proposed design makes sense, if I remember correctly .... Using the Git-ique metaphors of "plumbing" vs. "porcelain": The "plumbing" in this case is the data structure associated with the reader, which stores the triplets, include the pointer to the appropriate Nexus objects. As you note, added syntactic sugar would function as "porcelain", allowing for easier access to the annotations. Am I right in remembering that, with the proposed porcelain, if I want all the annotations referring to a Tree object, I can just ask the Tree object for all annotations that reference it, and I get back a vector (or list)? I guess the enum mechanism + casting is the only way to reconcile the flexibility of annotations' referents with the strong typing of C++. It would be great if the porcelain could wrap this up as much as possible. The problem of annotating tree edges/nodes and character columns/rows/cells remains. For the former, I personally would prefer to avoid the client-side cost/hassle of tracking a particular node's post-order traversal index. Instead, I think that if client code wants to access annotations on tree components, it would be perfectly fair to expect them to ask for a NCL Tree object and deal or otherwise harvest info from that. That way, edge and node annotations can be managed using the the same "porcelain" as for any other "first class" NCL objects. The character matrix case is a little more complicated, as a full implementation of a NeXML spec'd character object model requires a fat/rich stack of fat/rich classes (in Asia, we might say "prosperous"). If this needs and is going to be done as part of the GSoC NCL/NeXML project anyway, and if the implementation ends up with character cells/rows/columns all being "first class" NCL objects, then the annotations might be bundled up with these as part of their porcelain. Personally, I would be quite happy for a "prosperous" character object model in NCL. But I am not the one doing the coding! If we do not go that route, then maybe a specialized data structure associated with a character matrix that can facultatively take (a) a row index, (b) a column index or (c) a row/column = cell index and return the associated list of annotations might be desirable porcelain. -- jeet On 5/26/10 4:05 AM, Mark Holder wrote: > Hi ncl-devel, > This email was going to follow up on a conversation that was started on a thread that I kicked off when announcing the fact that NCL2.1 now has reasonable documentation, but I think that it is better to use a Wiki to launch this discussion rather than lots of email threads. I'm certainly happy to discuss the issues via email (rather on the wiki), but I thought it would be best to refer interested parties to https://sourceforge.net/apps/mediawiki/ncl/index.php?title=AnnotationAPIDiscussion > for the initial message in the discussion. > > The general topic was, "How should NCL store generic annotations and make them available to client code?" > > The motivation to get moving on this is the fact that Mick Elliot is adding support for nexml and phyloxml as a part of the Google Summer of Code, so we'll need to have a decision soon (next week). > > > > I'm honestly not sure who has access to the Wiki mentioned. I just enabled it as a feature through sourceforge this morning. Let me know if you have problems reading or writing, and I'll try to figure out how it is administered. > > > all the best, > Mark > > > Mark Holder > > mth...@ku... > http://phylo.bio.ku.edu/mark-holder > > ============================================== > Department of Ecology and Evolutionary Biology > University of Kansas > 6031 Haworth Hall > 1200 Sunnyside Avenue > Lawrence, Kansas 66045 > > lab phone: 785.864.5789 > > fax (shared): 785.864.5860 > ============================================== > > > > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Ncl-devel mailing list > Ncl...@li... > https://lists.sourceforge.net/lists/listinfo/ncl-devel > > -- -------------------------------------- Jeet Sukumaran -------------------------------------- Division of Herpetology Department of Ecology and Evolutionary Biology / Biodiversity Institute University of Kansas Dyche Hall 1345 Jayhawk Blvd Lawrence KS 66045-7561 -------------------------------------- Phone: 785-864-3439 Fax: 785-864-5335 E-mail(s): je...@ku... -------------------------------------- Personal Pages: http://jeetworks.org/ Photograph Galleries: http://jeet.smugmug.com/ Phylogeography and Biogeography Blog: http://geodendron.blogspot.com -------------------------------------- ------------------------------------------------------------------------------ _______________________________________________ Ncl-devel mailing list Ncl...@li... https://lists.sourceforge.net/lists/listinfo/ncl-devel |