Re: [Sax-devel] Should I use sax?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 22/05/06, Roberto Cosenza <li...@ro...> wrote:

> I'm writing an application that has to parse and MODIFY some XHTML.
> The result of the operation has to be the modified XHTML.
> I started using SAX (xerces) and, though seemed to work, I found out
> that SAX does not keep my doctype (all parsed documents take the XHTML
> Strict doctype). Besides, I was using the XHTMLSerializer class which is
> deprecated.
> Maybe I should use DOM for this job? The document is usually small. I'm
> missing some point?

There are two conditions for using an in-memory object tree like the DOM:

1. Your documents are relatively small.
2. Operations are relatively infrequent.

For example, if you're dealing with 100 MB documents, the DOM is
definitely out; likewise, if you're working on a server that will
handle hundreds or thousands of requests per second or a
batch-processing system that will open tens of thousands of XML
documents quickly, then DOM is probably out.  If neither of these
applies, then go ahead and use the DOM or another in-memory object
tree.

If you do need to use SAX, then the first thing to understand is that
SAX does not actually include a writing component -- whatever Xerces
uses is outside of SAX proper.  SAX2 does support *reading* DOCTYPE
information through its optional LexicalHandler interface:

http://www.saxproject.org/apidoc/org/xml/sax/ext/LexicalHandler.html

Finally, unless you do something special inside the DOCTYPE
declaration (like declaring entities or extending XHTML in some way),
you can just generate a boilerplate DOCTYPE when you write out your
document.

All the best,

David

--=20
http://www.megginson.com/