TinyXML / Discussion / Open Discussion: SAX model and "large" xml files

merimus - 2007-12-10

I was using tinyxml for the parser on a hypergraph library I've been working on.

First, I love your api. I'm using a SAX model for the import and your C++ api is great!

I did run into a problem however.
The data files I'm importing are rather large. Currently 600MB to 1GB. Multi GB files are in the near future. It appears that tinyxml implements a SAX parser by creating the DOM and they traversing it. In my case this results in very large memory usage. for the 1GB file memory peaks about 3.5GB.

For the time being I've switched to libxml. It's api isn't as nice but they use a streaming model for the sax parser.

Would you be interested in changing to a streaming sax parser?
If I volunteered my help would that influence your decision?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Lee Thomason - 2007-12-10
  
  That's interesting data on its own - I'm happy to heard that a 0.6-1.0GB file parses correctly. And the memory overhead of TinyXML being 3.5:1 is also good to know. (And something I've thought about reducing.)
  
  Although your interest is appreciated, TinyXML has been very successful in its niche of "easy to use DOM parser" and I don't think it doesn't makes sense to change that. The SAX parser space is pretty crowded.
  
  I there was to be a sustained interest in "TinySAXML", I'd think we would need to consider whether to extend TinyXML or to add a separate project on SourceForge. (I lean to the latter.)
  
  lee
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- merimus - 2007-12-10
  
  TinyXML is certainly easy to use. (which is why I'm looking at it).
  
  Do you think it would hurt the current TinyXML to implement a streaming SAX parser?
  
  To create a second "TinySAXML" project I'd imagine you would want to split TinyXML into two headers. One being common code to both front ends. This would probably go against the easy to use aspect.
  
  Another question is how many people are trying to do screwball things like this?
  It may simply be that loading multi GB xml files falls into the realm of hand written parsers.
  
  Besides that, I'm very impressed with TinyXML. It is certainly easy to use. Has a wonderful interface. And fulfills its role admirably. bravo!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Zmey - 2007-12-26
  
  > To create a second "TinySAXML" project I'd imagine you would want to split
  > TinyXML into two headers. One being common code to both front ends.
  
  The problem is, what "level" of SAX parsing will such library implement?
  
  DOM model is nice because of its simplicity. You slurp the XML file into memory, and you get all the data in one place - ready to use.
  
  SAX model, on the other hand, is callback-driven. The engine reads data in and calls user functions that should somehow process the data (store it, etc). This approach is very different from DOM, so I suppose it requires a completely different interface, too.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SAX model and "large" xml files

Forums

Help

SAX model and "large" xml files

SAX model and "large" xml files

Forums

Help

SAX model and "large" xml files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

SAX model and "large" xml files