Re: [Sax-devel] Should I use sax?
Brought to you by:
dmegginson
From: David M. <dav...@gm...> - 2006-05-22 14:49:31
|
On 22/05/06, Roberto Cosenza <li...@ro...> wrote: > I'm writing an application that has to parse and MODIFY some XHTML. > The result of the operation has to be the modified XHTML. > I started using SAX (xerces) and, though seemed to work, I found out > that SAX does not keep my doctype (all parsed documents take the XHTML > Strict doctype). Besides, I was using the XHTMLSerializer class which is > deprecated. > Maybe I should use DOM for this job? The document is usually small. I'm > missing some point? There are two conditions for using an in-memory object tree like the DOM: 1. Your documents are relatively small. 2. Operations are relatively infrequent. For example, if you're dealing with 100 MB documents, the DOM is definitely out; likewise, if you're working on a server that will handle hundreds or thousands of requests per second or a batch-processing system that will open tens of thousands of XML documents quickly, then DOM is probably out. If neither of these applies, then go ahead and use the DOM or another in-memory object tree. If you do need to use SAX, then the first thing to understand is that SAX does not actually include a writing component -- whatever Xerces uses is outside of SAX proper. SAX2 does support *reading* DOCTYPE information through its optional LexicalHandler interface: http://www.saxproject.org/apidoc/org/xml/sax/ext/LexicalHandler.html Finally, unless you do something special inside the DOCTYPE declaration (like declaring entities or extending XHTML in some way), you can just generate a boilerplate DOCTYPE when you write out your document. All the best, David --=20 http://www.megginson.com/ |