Re: [Sax-devel] Should I use sax?
Brought to you by:
dmegginson
From: Roberto C. <li...@ro...> - 2006-05-22 15:21:01
|
Hi David. Thank you for your clarification. It is clear, SAX is the way to go for me. I confirm that sax is reading the doctype and doing the validation accordingly. It was missleading that the XMLSerializer was outputting the "Strict" doctype unregarding of the input one. I have one more question: Parsing a document which contains the fragment <a href="mylink.html" >mylink</a>, generates events also for the a-attribute 'shape="rect"', which is not given in input!! Is there any way to turn this behavior off? It seems like the parser is inserting some implicit attributes here and there... /Roberto David Megginson wrote: > On 22/05/06, Roberto Cosenza <li...@ro...> wrote: > >> I'm writing an application that has to parse and MODIFY some XHTML. >> The result of the operation has to be the modified XHTML. >> I started using SAX (xerces) and, though seemed to work, I found out >> that SAX does not keep my doctype (all parsed documents take the XHTML >> Strict doctype). Besides, I was using the XHTMLSerializer class which is >> deprecated. >> Maybe I should use DOM for this job? The document is usually small. I'm >> missing some point? > > There are two conditions for using an in-memory object tree like the DOM: > > 1. Your documents are relatively small. > 2. Operations are relatively infrequent. > > For example, if you're dealing with 100 MB documents, the DOM is > definitely out; likewise, if you're working on a server that will > handle hundreds or thousands of requests per second or a > batch-processing system that will open tens of thousands of XML > documents quickly, then DOM is probably out. If neither of these > applies, then go ahead and use the DOM or another in-memory object > tree. > > If you do need to use SAX, then the first thing to understand is that > SAX does not actually include a writing component -- whatever Xerces > uses is outside of SAX proper. SAX2 does support *reading* DOCTYPE > information through its optional LexicalHandler interface: > > http://www.saxproject.org/apidoc/org/xml/sax/ext/LexicalHandler.html > > Finally, unless you do something special inside the DOCTYPE > declaration (like declaring entities or extending XHTML in some way), > you can just generate a boilerplate DOCTYPE when you write out your > document. > > > All the best, > > > David > -- Roberto Cosenza ICQ 12231605, MSN & Jabber robcos AT robcos.com Tel: +46-(0)70-4660928 Work Tel: +46-(0)8-55576860, Fax: +46-(0)8-55576861 |