TinyXML / Discussion / Developer: RFC: Visit() API

Lee Thomason - 2006-06-19

We're planning to add an API to TinyXML that would "visit" each node, and call back the host application as the XML tree is walked. This is very similar to how a SAX API works. (Although this is not a "sax mode" for TinyXML. It still parses the entire DOM up front.)

The interface that the host application provides to TinyXML for callbacks, is:

/**
    If you call the Visit() method, it requires being passed a TiXmlVisitHandler
    class to handle callbacks. For nodes that contain other nodes (Document, Element)
    you will get called with a Start/End pair. Nodes that are always leaves
    are preceded with "On".

    Generally Visit() is called on the TiXmlDocument, although all nodes suppert Visiting.

    You should never change the document from a callback.
*/
class TiXmlVisitHandler
{
public:
virtual void StartDocument( const TiXmlDocument& doc, int depth ) = 0;
virtual void EndDocument( const TiXmlDocument& doc, int depth ) = 0;

virtual void StartElement( const TiXmlElement& element, const TiXmlAttribute* firstAttribute, int depth ) = 0;
virtual void EndElement( const TiXmlElement& element, int depth ) = 0;

virtual void OnDeclaration( const TiXmlDeclaration& declaration, int depth ) = 0;
virtual void OnText( const TiXmlText& text, int depth ) = 0;
virtual void OnComment( const TiXmlComment& comment, int depth ) = 0;
virtual void OnUnknown( const TiXmlUnknown& unknown, int depth ) = 0;
};

And to start the visiting:

    /** Walk the XML tree visiting this node and all of its children.
    */
virtual void Visit( TiXmlVisitHandler* content, int depth = 0 ) const;

Comments / improvements / suggestions appreciated.

lee

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Alexander Sashnov - 2006-06-19
  
  IMHO this is the good proposal.
  May be name it as TiXmlVisitor ?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2006-06-19
  
  Left off the scope. :)
  
  Should be:
  
  TiXmlNode::Visit
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-06-25
  
  I agree with Alex, the proper name should be TiXmlVisitor and not TiXmlVisitHandler. But the idea is great, hope it gets implemented.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2006-06-27
  
  Good suggestions - TiXmlVisitor is better.
  
  lee
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-06-30
  
  When it comes to the accepting function, the one in the XML tree that has a "visit function" that takes a TiXmlVisitor there is a common name for it: accept(). I've also seen traverse() but accept() is the most common one. You can google the "Visitor design pattern" to see more common names in the pattern, like the actual visiting class being called Visitor (TiXmlVisitor) and its visitConcreteType functions (TiXmlVisitor::VisitDeclaration(), TiXmlVisitor::VisitText(), etc).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-07-06
  
  I have recently implemented my own Visitor pattern for a menu traversal, where I needed to know when I left a submenu and could filter out entire subtrees. As usual, this has already been done many times before and is known as a Hierarchical Visitor Pattern.
  
  I strongly suggest that TinyXML follows this pattern, you can read more about it here: http://c2.com/cgi/wiki?HierarchicalVisitorPattern
  
  Your visitor pattern here is almost like it (except for the function names, which I hope will be changed). The big difference is the use of bools as return values, this gives the opportunity for a visitor to "trim off" a whole section of XML elements! Read more about that in the link above.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2006-07-20
  
  What a great link! Thanks for the info. I like the trimming, and the better naming convention. However, since we are visiting known things I could be more specific, and the depth is very useful for XML parsing.
  
  With that thought, the new version:
  
  class TiXmlVisitor
  {
  public:
      virtual ~TiXmlVisitor() {}
  
      virtual bool EnterDocument( const TiXmlDocument& doc, int depth ) = 0;
      virtual bool ExitDocument( const TiXmlDocument& doc, int depth ) = 0;
  
      virtual bool EnterElement( const TiXmlElement& element, const TiXmlAttribute* firstAttribute, int depth ) = 0;
      virtual bool ExitElement( const TiXmlElement& element, int depth ) = 0;
  
      virtual bool OnDeclaration( const TiXmlDeclaration& declaration, int depth ) = 0;
      virtual bool OnText( const TiXmlText& text, int depth ) = 0;
      virtual bool OnComment( const TiXmlComment& comment, int depth ) = 0;
      virtual bool OnUnknown( const TiXmlUnknown& unknown, int depth ) = 0;
  };
  
  Engaged by the call to:
  TiXmlNode::virtual bool Accept( TiXmlVisitor* visitor, int depth=0 ) const = 0;
  
  How does that look?
  lee
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2006-07-20
  
  Mistake - the OnXYZ above should be VisitXYZ.
  
  lee
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-07-23
  
  No problem. That link taught me a bunch too.
  
  I saw your post now after I e-mailed you. But for the public I'll repeat my comments.
  
  I prefer short function names and using the parameters to figure out what the function does. VisitComment( TiXmlComment ) seems kinda reduntant.
  
  The depth can be usefull. -Can- be. So I think it's better added "from the outside", like a Visitor. I supplied an example of a depth aware visitor one can inherit to know the depth.
  
  The attribute traversing is a bit problematic... In principle I think it should be part of the normal interface, and work exactly the same as the rest; the Visit( TiXmlAttribute ) returns a bool, true if you want to visit the other attributes. The problem is that this has to be called -in the middle of- the VisitEnter( TiXmlElement ) otherwise a PrintToMemoryVisitor isn't possible. But I made a AttributeVisitor and sent it to you too. One can inherit that to visit the whole XML including attributes.
  
  I'll think some more about them and keep you updated.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-07-23
  
  Add to the above post:
  The Attribute Visitor could be integrated to the normal interface with a TiXmlVisitor:: VisitEnterBegin and VisitEnterEnd taking a TiXmlElement, where Visit( TiXmlAttribute ) are called in between.
  
  In keeping the interface small and simple I would like the TiXmlAttributes parameter in EnterElement removed. But adding two new functions (split EnterElement and add VisitAttribute) works against keeping it small :P
  
  Oh, and I recommend default implementations over pure virtuals in this visitor. Since only very few visitors would like to visit -all- TiXmlNodes. Implementing a TiXmlVisitor should be easy and fast, meaning just override what one want.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2006-07-23
  
  Doh! I just responded to the personal email without checking here first. We'll stick with the forum.
  
  Breaking out the threads:
  
  #1: Method name form:
  a. VisitComment( TiXmlComment& )
  b. Visit( TiXmlNode& )
  c. Visit( VisitComment& ) (using overloads to differentiate)
  
  (b.) is my least preferred, as it pushes a switch case to the client. Good approach for generic data types, but TinyXML has a limited known set of types.
  
  I prefer the more verbose (a) over (c) preferred by JP. Anyone else?
  
  #2. Depth does seem a little hokey. I'm considering pulling that and adding a Depth() property to TiXmlNode.
  
  #3 Attribute traversal isn't clean. I think it's just a case of what is easier for the client program: having a linked list to traverse, or get call backs for each attribute?
  
  SAX uses the list of attributes, even in their v2 implemenation. They have done quite a bit of work on the XML visitor paradigm, and I was leaning to just accepting their standard.
  
  #4 Default implemenation of the callbacks. Done - CVS has been changed.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-07-26
  
  Hi,
  
  #1 The b option would most ruin option c. I'm all for c. The parameter already says what the visit() is visiting, both TiXmlVisitor::Visit( TiXmlComment& ) (guessing that the VisitComment& in Lee's post was just a typo) and visitor.Visit( *this ); inside a TiXmlComment or visitor.Visit( comment ); already says so much. It also opens up possibilites with templates and function pointers, since you don't explicitly have to differentiate between the types and can leave that up to the program. Being verbose in this case will put some limitations on creative Visitors.
  
  #2. Adding to the TiXml interface isn't the point of the Visitor, but I can see other uses for Depth.
  
  #3. I'm open for following standards. I haven't checked SAX yet so no comments.
  
  #2 & #3 are solvable through Visitors. I'm not happy with mine yet though...
  
  #4. Awesome!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2006-08-08
  
  The ability to templatize is a good point. I switched over to "approach c".
  
  There are really 3 ways to print in TinyXML:
  1. Print( FILE* )
  2. operator<<
  3. PrintToMemory
  
  The Print(FILE*) is special because it has a low memory overhead. However the operator<< and PrintToMemory are almost the same code. I'm considering wrapping a single "Print" visitor into the code that is called in both cases, with different options.
  
  Still experimenting.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- John-Philip Leonard Johansson - 2006-08-10
  
  Operator << becomes suppored with the Print visitors char buffer accessor. Like so:
  PrintVisitor pv;
  xmlRoot.accept( pv );
  cout << pv.buffer(); // using the standard libs support for char* or string
  Unless I've missed some point?
  
  Can't the Print(FILE*) become supported by print visitor? Or at least refactored out to a FilePrintVisitor?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

RFC: Visit() API

Forums

Help

RFC: Visit() API

RFC: Visit() API

Forums

Help

RFC: Visit() API document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

RFC: Visit() API