From: Petr C. <pci...@us...> - 2004-02-19 10:00:23
|
Update of /cvsroot/perl-xml/sax-perl-org In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17540 Modified Files: index.html Added Files: changes-2.1 sax-2.1-adv.html sax-2.1-idx.html sax-2.1-ref.html sax-2.1.html Log Message: --- NEW FILE: changes-2.1 --- ======================================= = Perl SAX 2.1 Changes and Issues = ======================================= The purpose of this file is to record changes done from Perl SAX 2.0 and to track related issues. --------------------------------------- Changes --------------------------------------- [C1] XMLVersion and Encoding fields added to document locator (as in Locator2 interface of SAX 2.0 Ext. 1.1) --------------------------------------- Issues --------------------------------------- [I1] Resolution: none A parser should advertise SAX version it supports. There can be a new method ($parser->get_sax_version()) or a read-only feature (http://xmlns.perl.org/sax/version). This feature should be introduced also to Perl SAX 2.0 retrospectively to distinguish between 1.0, 2.0 and 2.1 drivers. [I2] Resolution: none "http://xml.org/sax/handlers/LexicalHandler" feature on the parser needs to be set to the object to receive lexical events currently. If the reader does not support lexical events, it will throw a XML::SAX::Exception::NotRecognized or a XML::SAX::Exception::NotSupported when you attempt to register the handler. DeclHandler works in the same way. Actually, this is a theory - XML::SAX::Base doesn't implement this currently. This approach is very different from the common PerlSAX mechanism: look for a specific handler, then look for a handler method on the default handler, ignore the callback when not found. It would be more 'perlish' to apply this simple mechanism to LexicalHandler and DeclHandler too. If we want these two be extension handlers (compliant 2.1 parsers are not required to support them) there could be read-only features to let apps to know whether extension handlers are supported o not (http://xmlns.perl.org/sax/LexicalHandler, DeclHandler). [I3] Resolution: none SAX 2.0 Ext. 1.1 has a new Attributes2 interface which extends attributes with new properties (Declared, Specified) to distinguish between attributed specified in an XML doc and those declared in DTD. This could be introduced into Perl SAX 2.1 as an optional extension (advertised by a feature). --- NEW FILE: sax-2.1-adv.html --- <!-- $Id: sax-2.1-adv.html,v 1.1 2004/02/19 09:50:04 pcimprich Exp $ --> <html> <head> <title>Advanced Features of the Perl SAX 2.1 Binding</title> <meta name="keywords" content="XML SGML SAX Perl libxml libxml-perl" /> </head> <body> <h1>Advanced SAX</h1> <p>The classes, methods, and features described below are not commonly used in most applications and can be ignored by most users. If however you find that you are not getting the granularity you expect from Basic SAX, this would be the place to look for more. Advanced SAX isn't advanced in the sense that it is harder, or requires better programming skills. It is simply more complete, and has been separated to keep Basic SAX simple in terms of the number of events one would have to deal with. </p> [...975 lines suppressed...] <li> Method names have been converted to lower-case with underscores. Parameters are all mixed case with initial upper-case. </li> </ul> <p> If compatibility is a problem for you consider writing a Filter that converts from this style to the one you want. It is likely that such a Filter will be available from CPAN in the not distant future. </p> <!-- <p>[FIXME: need to list package/class name equivalents for all hashes.]</p> --> </body> </html> --- NEW FILE: sax-2.1-idx.html --- <!-- $Id: sax-2.1-idx.html,v 1.1 2004/02/19 09:50:04 pcimprich Exp $ --> <html> <head> <title>Perl SAX 2.1</title> <style> a {text-decoration:none} div {font-size: 14px; color: #777777;} div.box {padding:4px 2px 4px 2px; margin:0px 0px 10px 0px; border:1px solid #777777;} div.title {font-size:16px; font-weight:bold; float:left; padding:0px 0px 0px 2px;} div.item {padding: 1px 0px 0px 10px; clear:left;} div.right {float:right; padding:0px 5px 0px 0px;} </style> </head> <body> <div class="box" style="background-color:#eeeeee"> <div class="title">Perl SAX 2.1</div> <div class="right">[<a href="http://sax.perl.org">home</a>]</div> <div class="item"> <a href="#parser">Parser</a> </div> <div class="item"> <a href="#content">ContentHandler</a> </div> <div class="item"> <a href="#error">ErrorHandler</a> </div> <div class="item"> <a href="#lexical">LexicalHandler</a> </div> <div class="item"> <a href="#dtd">DTDHandler</a> </div> <div class="item"> <a href="#decl">DeclHandler</a> </div> <div class="item"> <a href="#resolver">Entity Resolver</a> </div> <div class="item"> <a href="#other">other objects</a> </div> </div> <div class="box"> <div class="title"><a name="parser"/>Parser</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1.html#parse" target="cnt">parse()</a> </div> <div class="item"> <a href="sax-2.1.html#parseFile" target="cnt">parse_file()</a> </div> <div class="item"> <a href="sax-2.1.html#parseString" target="cnt">parse_string()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#getFeature" target="cnt">get_feature()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#setFeature" target="cnt">set_feature()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#getFeatures" target="cnt">get_features()</a> </div> </div> <div class="box"> <div class="title"><a name="content"/>ContentHandler</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1-adv.html#setDocumentLocator" target="cnt">set_document_locator()</a> </div> <div class="item"> <a href="sax-2.1.html#startDocument" target="cnt">start_document()</a> </div> <div class="item"> <a href="sax-2.1.html#endDocument" target="cnt">end_document()</a> </div> <div class="item"> <a href="sax-2.1.html#startElement" target="cnt">start_element()</a> </div> <div class="item"> <a href="sax-2.1.html#endElement" target="cnt">end_element()</a> </div> <div class="item"> <a href="sax-2.1.html#characters" target="cnt">characters()</a> </div> <div class="item"> <a href="sax-2.1.html#ignorableWhitespace" target="cnt">ignorable_whitespace()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#startPrefixMapping" target="cnt">start_prefix_mapping()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#endPrefixMapping" target="cnt">end_prefix_mapping()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#processingInstruction" target="cnt">processing_instruction()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#skippedEntity" target="cnt">skipped_entity()</a> </div> </div> <div class="box"> <div class="title"><a name="error"/>ErrorHandler</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1-adv.html#warning" target="cnt">warning()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#error" target="cnt">error()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#fatalError" target="cnt">fatal_error()</a> </div> </div> <div class="box"> <div class="title"><a name="lexical"/>LexicalHandler</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1-adv.html#startDTD" target="cnt">start_dtd()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#endDTD" target="cnt">end_dtd()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#startEntity" target="cnt">start_entity()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#endEntity" target="cnt">end_entity()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#startCDATA" target="cnt">start_cdata()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#endCDATA" target="cnt">end_cdata()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#comment" target="cnt">comment()</a> </div> </div> <div class="box"> <div class="title"><a name="dtd"/>DTDHandler</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1-adv.html#notationDecl" target="cnt">notation_decl()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#unparsedEntity" target="cnt">unparsed_entity_decl()</a> </div> </div> <div class="box"> <div class="title"><a name="decl"/>DeclHandler</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1-adv.html#elementDecl" target="cnt">element_decl()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#attributeDecl" target="cnt">attribute_decl()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#internalEntityDecl" target="cnt">internal_entity_decl()</a> </div> <div class="item"> <a href="sax-2.1-adv.html#externalEntityDecl" target="cnt">external_entity_decl()</a> </div> </div> <div class="box"> <div class="title"><a name="resolver"/>EntityResolver</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1-adv.html#resolveEntity" target="cnt">resolve_entity()</a> </div> </div> <div class="box"> <div class="title"><a name="other"/>other objects</div> <div class="right">[<a href="#top">top</a>]</div> <div class="item"> <a href="sax-2.1.html#Exceptions" target="cnt">Exceptions</a> </div> <div class="item"> <a href="sax-2.1-adv.html#InputSources" target="cnt">Input Sources</a> </div> <div class="item"> <a href="sax-2.1-adv.html#Features" target="cnt">Features</a> </div> </div> </body> </html> --- NEW FILE: sax-2.1-ref.html --- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> <frameset cols="225, *"> <noframes> <a href="sax-2.1.html">Perl SAX 2.1 Binding</a> </noframes> <frame name="idx" id="idx" src="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/sax-perl-org/sax-2.1-idx.html?rev=HEAD&content-type=text/html" /> <frame name="cnt" id="cnt"src="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/sax-perl-org/sax-2.1.html?rev=HEAD&content-type=text/html" /> </frameset> --- NEW FILE: sax-2.1.html --- <!-- $Id: sax-2.1.html,v 1.1 2004/02/19 09:50:04 pcimprich Exp $ --> <html> <head> <title>Perl SAX 2.1 Binding</title> </head> <body> <h1>Perl SAX 2.1 Binding</h1> <p>SAX (Simple API for XML) is a common parser interface for XML parsers. It allows application writers to write applications that use XML parsers, but are independent of which parser is actually used.</p> <p>This document describes the version of SAX used by Perl modules. The original version of SAX 2.0, for Java, is described at <a href="http://sax.sourceforge.net/">http://sax.sourceforge.net/</a>.</p> <p>There are two basic interfaces in the Perl version of SAX, the parser interface and the handler interface. The parser interface creates new parser instances, starts parsing, and provides additional information to handlers on request. The handler interface is used to receive parse events from the parser. This pattern is also commonly called "Producer and Consumer" or "Generator and Sink". Note that the parser doesn't have to be an XML parser, all it needs to do is provide a stream of events to the handler as if it were parsing XML. But the actual data from which the events are generated can be anything, a Perl object, a CSV file, a database table... </p> <p>SAX is typically used like this: <pre> my $handler = MyHandler->new(); my $parser = AnySAXParser->new( Handler => $handler ); $parser->parse($uri); </pre></p> <p>Handlers are typically written like this: <pre> package MyHandler; sub new { my $type = shift; return bless {}, $type; } sub start_element { my ($self, $element) = @_; print "Starting element $element->{Name}\n"; } sub end_element { my ($self, $element) = @_; print "Ending element $element->{Name}\n"; } sub characters { my ($self, $characters) = @_; print "characters: $characters->{Data}\n"; } 1; </pre></p> <h2>Basic SAX Parser</h2> <p>These methods and options are the most commonly used with SAX parsers and event generators.</p> <p>Applications may not invoke a <tt>parse()</tt> method again while a parse is in progress (they should create a new SAX parser instead for each nested XML document). Once a parse is complete, an application may reuse the same parser object, possibly with a different input source.</p> <p>During the parse, the parser will provide information about the XML document through the registered event handlers. Note that an event that hasn't been registered (ie that doesn't have its corresponding method in the handler's class) will <b>not</b> be called. This allows one to only get the events one is interested in. </p> <p>If you generate SAX events, data are required to be passed to handler methods with all properties defined in this document unless otherwise specified. </p> <p><a name="parse"/> <dl><dt><b><tt class='function'>parse</tt></b>(<var>uri</var> [, <var>options</var>])</dt> <dd> Parses the XML instance identified by <var>uri</var> (a system identifier). <var>options</var> can be a list of options, value pairs or a hash (reference). Options include <tt>Handler</tt>, features and properties, and advanced SAX parser options. <tt>parse()</tt> returns the result of calling the <tt>end_document()</tt> handler. The options supported by <tt>parse()</tt> may vary slightly if what is being "parsed" isn't XML. </dd></dl></p> <p><a name="parseFile"/> <dl><dt><b><tt class='function'>parse_file</tt></b>(<var>stream</var> [, <var>options</var>])</dt> <dd> Parses the XML instance in the already opened <var>stream</var>, an IO::Handler or similar. <var>options</var> are the same as for <tt class='function'>parse()</tt>. <tt>parse_file()</tt> returns the result of calling the <tt>end_document()</tt> handler.</dd></dl></p> <p><a name="parseString"/> <dl><dt><b><tt class='function'>parse_string</tt></b>(<var>string</var> [, <var>options</var>])</dt> <dd> Parses the XML instance in <var>string</var>. <var>options</var> are the same as for <tt class='function'>parse()</tt>. <tt>parse_string()</tt> returns the result of calling the <tt>end_document()</tt> handler.</dd></dl></p> <p> <dl><dt><b><tt>Handler</tt></b></dt> <dd> The default handler object to receive all events from the parser. Applications may change <tt>Handler</tt> in the middle of the parse and the SAX parser will begin using the new handler immediately. The <a href="sax-2.1-adv.html">Advanced SAX</a> document lists a number of more specialized handlers that can be used should you wish to dispatch different types of events to different objects. </dd></dl></p> <h2><a name="BasicHandler">Basic SAX Handler</a></h2> <p>These methods are the most commonly used by SAX handlers.</p> <p><a name="startDocument"/> <dl><dt><b><tt class='function'>start_document</tt></b>(<var>document</var>)</dt> <dd> Receive notification of the beginning of a document. <p>The SAX parser will invoke this method only once, before any other methods (except for <tt>set_document_locator()</tt> in advanced SAX handlers).</p> No properties are defined for this event (<var>document</var> is empty).</dd></dl></p> <p><a name="endDocument"/> <dl><dt><b><tt class='function'>end_document</tt></b>(<var>document</var>)</dt> <dd> Receive notification of the end of a document. <p>The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.</p> <p>No properties are defined for this event (<var>document</var> is empty).</p> The return value of <tt>end_document()</tt> is returned by the parser's <tt>parse()</tt> methods.</dd></dl></p> <p><a name="startElement"/> <dl><dt><b><tt class='function'>start_element</tt></b>(<var>element</var>)</dt> <dd> Receive notification of the start of an element. <p>The Parser will invoke this method at the beginning of every element in the XML document; there will be a corresponding <tt>end_element()</tt> event for every <tt>start_element()</tt> event (even when the element is empty). All of the element's content will be reported, in order, before the corresponding <tt>end_element()</tt> event.</p> <var>element</var> is a hash reference with these properties: <blockquote> <table> <tr><td><b><tt>Name</tt></b></td> <td>The element type name (including prefix).</td></tr> <tr><td><b><tt>Attributes</tt></b></td> <td>The attributes attached to the element, if any.</td></tr> </table> </blockquote> If namespace processing is turned on (which is the default), these properties are also available: <blockquote> <table> <tr><td><b><tt>NamespaceURI</tt></b></td> <td>The namespace of this element.</td></tr> <tr><td><b><tt>Prefix</tt></b></td> <td>The namespace prefix used on this element.</td></tr> <tr><td><b><tt>LocalName</tt></b></td> <td>The local name of this element.</td></tr> </table> </blockquote> <tt>Attributes</tt> is a reference to hash keyed by JClark namespace notation. That is, the keys are of the form "{NamespaceURI}LocalName". If the attribute has no NamespaceURI, then it is simply "{}LocalName". Each attribute is a hash reference with these properties: <blockquote> <table> <tr><td><b><tt>Name</tt></b></td> <td>The attribute name (including prefix).</td></tr> <tr><td><b><tt>Value</tt></b></td> <td>The normalized value of the attribute.</td></tr> <tr><td><b><tt>NamespaceURI</tt></b></td> <td>The namespace of this attribute.</td></tr> <tr><td><b><tt>Prefix</tt></b></td> <td>The namespace prefix used on this attribute.</td></tr> <tr><td><b><tt>LocalName</tt></b></td> <td>The local name of this attribute.</td></tr> </table> </blockquote> </dd> </dl> </p> <p><a name="endElement"/> <dl><dt><b><tt class='function'>end_element</tt></b>(<var>element</var>)</dt> <dd> Receive notification of the end of an element. <p>The SAX parser will invoke this method at the end of every element in the XML document; there will be a corresponding <tt class='function'>start_element()</tt> event for every <tt class='function'>end_element()</tt> event (even when the element is empty).</p> <var>element</var> is a hash reference with these properties: <blockquote> <table> <tr><td><b><tt>Name</tt></b></td> <td>The element type name (including prefix).</td></tr> </table> </blockquote> If namespace processing is turned on (which is the default), these properties are also available: <blockquote> <table> <tr><td><b><tt>NamespaceURI</tt></b></td> <td>The namespace of this element.</td></tr> <tr><td><b><tt>Prefix</tt></b></td> <td>The namespace prefix used on this element.</td></tr> <tr><td><b><tt>LocalName</tt></b></td> <td>The local name of this element.</td></tr> </table> </blockquote></dd> </dl></p> <p><a name="characters"/> <dl><dt><b><tt class='function'>characters</tt></b>(<var>characters</var>)</dt> <dd> Receive notification of character data. <p>The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks (however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information).</p> <p><var>characters</var> is a hash reference with this property:</p> <blockquote> <table> <tr><td><b><tt>Data</tt></b></td> <td>The characters from the XML document.</td></tr> </table> </blockquote></dd> </dl></p> <p><a name="ignorableWhitespace"/> <dl><dt><b><tt class='function'>ignorable_whitespace</tt></b>(<var>characters</var>)</dt> <dd> Receive notification of ignorable whitespace in element content. <p>Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.</p> <p>SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.</p> <p><var>characters</var> is a hash reference with this property:</p> <blockquote> <table> <tr><td><b><tt>Data</tt></b></td> <td>The whitespace characters from the XML document.</td></tr> </table> </blockquote></dd> </dl></p> <h2><a name="Exceptions">Exceptions</a></h2> <p> Conformant XML parsers are required to abort processing when well-formedness or validation errors occur. In Perl, SAX parsers use <tt>die()</tt> to signal these errors. To catch these errors and prevent them from killing your program, use <tt>eval{}</tt>: <pre> eval { $parser->parse($uri) }; if ($@) { # handle error } </pre> </p> <p> Exceptions can also be thrown when setting features or properties on the SAX parser (see advanced SAX below).</p> <p> Exception values (<tt>$@</tt>) in SAX are hash references blessed into the package that defines their type, and have the following properties: </p> <blockquote> <table> <tr><td><b><tt>Message</tt></b></td> <td>A detail message for this exception.</td></tr> <tr><td><b><tt>Exception</tt></b></td> <td>The embedded exception, or <tt>undef</tt> if there is none.</td></tr> </table> </blockquote> If the exception is raised due to parse errors, these properties are also available: <blockquote> <table> <tr><td><b><tt>ColumnNumber</tt></b></td> <td>The column number of the end of the text where the exception occurred.</td></tr> <tr><td><b><tt>LineNumber</tt></b></td> <td>The line number of the end of the text where the exception occurred.</td></tr> <tr><td><b><tt>PublicId</tt></b></td> <td>The public identifier of the entity where the exception occurred.</td></tr> <tr><td><b><tt>SystemId</tt></b></td> <td>The system identifier of the entity where the exception occurred.</td></tr> </table> </blockquote> <p></p><hr /> <h2>Advanced SAX</h2> <ul> <li><a href="sax-2.1-adv.html#Parsers">SAX Parsers</a></li> <li><a href="sax-2.1-adv.html#Features">Features</a></li> <li><a href="sax-2.1-adv.html#InputSources">Input Sources</a></li> <li><a href="sax-2.1-adv.html#Handlers">SAX Handlers</a></li> <li><a href="sax-2.1-adv.html#Filters">SAX Filters</a></li> <li><a href="sax-2.1-adv.html#Java">Java and DOM Compatibility</a></li> </ul> </body> </html> Index: index.html =================================================================== RCS file: /cvsroot/perl-xml/sax-perl-org/index.html,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -u -d -r1.1.1.1 -r1.2 --- index.html 27 Jan 2002 14:02:40 -0000 1.1.1.1 +++ index.html 19 Feb 2004 09:50:04 -0000 1.2 @@ -25,6 +25,25 @@ </ul> </td> </tr> + + <tr> + <td><b>SAX 2.1 Working Drafts</b></td> + </tr> + <tr> + <td> + <ul> + <li><a href="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/sax-perl-org/sax-2.1.html?rev=HEAD&content-type=text/html">Perl + SAX 2.1 Binding</a></li> + <li><a href="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/sax-perl-org/sax-2.1-adv.html?rev=HEAD&content-type=text/html">Perl + SAX 2.1 Advanced</a></li> + <li><a href="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/sax-perl-org/sax-2.1-ref.html?rev=HEAD&content-type=text/html">Perl + SAX 2.1 Reference</a></li> + <li><a href="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/sax-perl-org/changes-2.1?rev=HEAD&content-type=text/plain">Changes + and Issues</a></li> + </ul> + </td> + </tr> + <tr> <td><b>Misc.</b></td> </tr> |