Menu

XML Parsing

Developers
2001-03-06
2001-06-13
  • Paul Rogers

    Paul Rogers - 2001-03-06

    After a little exploration on the web, I think I've found an XML library that could serve our purpose.  It is called expat.  It's small, and relativly simple.  This could greatly simplify our parseing code, which is currently a bit messy.

    I'll look into it a bit further and let you all know how it turns out.

     
    • Ryan Burkett

      Ryan Burkett - 2001-06-12

      Hi all,
      I've been doing a bit of research into XML myself.  The current HTreeReader roughly approximates a SAX XML parser, and HTemplateField adds validation.  But these classes are getting unwieldy.

      Ideally, I'd like to use a validating XML parser with schema support in Manta for the following reasons:

      1. Rather than using a bunch of spaghetti code that responds to SAX events, we could create well-formed data classes based on the contents of the XML document (see http://www.xmlmag.com/upload/free/features/xml/2000/03sum00/jb0300/jb0300.asp\)

      2. Schemas would take care of the validation
      3. Schemas, unlike DTDs, allow an XML document to be extended. For example, in the master level file's schema we could declare that an actor has a "type", and that type is given by a string. Then Manta programmers would declare in another schema, derived from ours, that the type must be one of a set of strings, { "Shark", "Submarine, "Diver" }

      4. There is a standard proposed by Sun (JSR-031) lets you generate java classes from XML schemas.  Then #1 above could be done automatically, assuming that the standard can be applied to C++ classes as well.

      The library I'm looking at is Xerces, at http://xml.apache.org/xerces-c/index.html. The daily builds have started to implement XML schema support (schemas themselves still only a proposed XML standard).

      I will try looking into this more, and let you know if it helps simplify the file-loading code.

       
    • Paul Rogers

      Paul Rogers - 2001-06-13

      I worked with the Java SAX parser on my last workterm.  The SAX model is purly event driven, and has no concept of the tree structure the HTreeReader gives it.

      HTreeReader seems to me to be a cross between a DOM parser and a SAX parser.  The DOM (Document Object Model) represents the XML file as a tree of nodes.  From what I can see HTreeReader uses a bit of both.

      When I was looking at libraries a couple months ago I did notice xerces.  At the time I dismissed it because it is large, which means lots of extra stuff to download to get the game working.  Also, it doesn't have a BeOS port yet.  However, it may be worth considering anyway.

      My original idea for cleaning up the parsing was to build up from a simplified SAX parser, and then clean up HTreeReader so all it does is respond to the SAX events to build a DOM style object tree.

      The end result would be a class in manta into which a program would send a file name, and get back a generic tree representation of the file.

      All my work so far on the simplified SAX parser is in the XMLParser class.  The code is complete, but untested.  I don't think we should do any more work on parsing until we make a decision on which direction we are going.

       

Log in to post a comment.

MongoDB Logo MongoDB