From: A.M. K. <aku...@us...> - 2000-10-12 02:37:18
|
Update of /cvsroot/py-howto/pyhowto In directory slayer.i.sourceforge.net:/tmp/cvs-serv6586 Modified Files: python-2.0.tex Log Message: Add new section on the XML package. (This was the only major new 2.0 feature left that wasn't covered. The article is therefore now essentially complete.) A few minor changes Index: python-2.0.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/python-2.0.tex,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -r1.35 -r1.36 *** python-2.0.tex 2000/10/04 12:40:44 1.35 --- python-2.0.tex 2000/10/12 02:37:14 1.36 *************** *** 157,162 **** distribution; it's also available on the Web at \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}. ! This article will simply cover the most significant points from the ! full interface. In Python source code, Unicode strings are written as --- 157,162 ---- distribution; it's also available on the Web at \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}. ! This article will simply cover the most significant points about the Unicode ! interfaces. In Python source code, Unicode strings are written as *************** *** 616,625 **** The comparison \code{a==b} returns true, because the two recursive ! data structures are isomorphic. \footnote{See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links. ! %http://www.python.org/pipermail/python-dev/2000-April/004834.html ! } Work has been done on porting Python to 64-bit Windows on the Itanium --- 616,625 ---- The comparison \code{a==b} returns true, because the two recursive ! data structures are isomorphic. See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links. ! % Starting URL: ! % http://www.python.org/pipermail/python-dev/2000-April/004834.html Work has been done on porting Python to 64-bit Windows on the Itanium *************** *** 951,955 **** setup (name = "PyXML", version = "0.5.4", ext_modules =[ expat_extension ] ) - \end{verbatim} --- 951,954 ---- *************** *** 967,975 **** Modules}, that joins the basic set of Python documentation. ! % ====================================================================== ! %\section{New XML Code} ! %XXX write this section... % ====================================================================== \section{Module changes} --- 966,1129 ---- Modules}, that joins the basic set of Python documentation. ! ====================================================================== ! \section{XML Modules} ! ! Python 1.5.2 included a simple XML parser in the form of the ! \module{xmllib} module, contributed by Sjoerd Mullender. Since ! 1.5.2's release, two different interfaces for processing XML have ! become common: SAX2 (version 2 of the Simple API for XML) provides an ! event-driven interface with some similarities to \module{xmllib}, and ! the DOM (Document Object Model) provides a tree-based interface, ! transforming an XML document into a tree of nodes that can be ! traversed and modified. Python 2.0 includes a SAX2 interface and a ! stripped-down DOM interface as part of the \module{xml} package. ! Here we will give a brief overview of these new interfaces; consult ! the Python documentation or the source code for complete details. ! The Python XML SIG is also working on improved documentation. ! ! \subsection{SAX2 Support} ! ! SAX defines an event-driven interface for parsing XML. To use SAX, ! you must write a SAX handler class. Handler classes inherit from ! various classes provided by SAX, and override various methods that ! will then be called by the XML parser. For example, the ! \method{startElement} and \method{endElement} methods are called for ! every starting and end tag encountered by the parser, the ! \method{characters()} method is called for every chunk of character ! data, and so forth. ! ! The advantage of the event-driven approach is that that the whole ! document doesn't have to be resident in memory at any one time, which ! matters if you are processing really huge documents. However, writing ! the SAX handler class can get very complicated if you're trying to ! modify the document structure in some elaborate way. ! ! For example, this little example program defines a handler that prints ! a message for every starting and ending tag, and then parses the file ! \file{hamlet.xml} using it: ! ! \begin{verbatim} ! from xml import sax ! ! class SimpleHandler(sax.ContentHandler): ! def startElement(self, name, attrs): ! print 'Start of element:', name, attrs.keys() ! ! def endElement(self, name): ! print 'End of element:', name ! ! # Create a parser object ! parser = sax.make_parser() ! ! # Tell it what handler to use ! handler = SimpleHandler() ! parser.setContentHandler( handler ) ! ! # Parse a file! ! parser.parse( 'hamlet.xml' ) ! \end{verbatim} ! ! For more information, consult the Python documentation, or the XML ! HOWTO at \url{http://www.python.org/doc/howto/xml/}. ! ! \subsection{DOM Support} ! ! The Document Object Model is a tree-based representation for an XML ! document. A top-level \class{Document} instance is the root of the ! tree, and has a single child which is the top-level \class{Element} ! instance. This \class{Element} has children nodes representing ! character data and any sub-elements, which may have further children ! of their own, and so forth. Using the DOM you can traverse the ! resulting tree any way you like, access element and attribute values, ! insert and delete nodes, and convert the tree back into XML. ! ! The DOM is useful for modifying XML documents, because you can create ! a DOM tree, modify it by adding new nodes or rearranging subtrees, and ! then produce a new XML document as output. You can also construct a ! DOM tree manually and convert it to XML, which can be a more flexible ! way of producing XML output than simply writing ! \code{<tag1>}...\code{</tag1>} to a file. ! ! The DOM implementation included with Python lives in the ! \module{xml.dom.minidom} module. It's a lightweight implementation of ! the Level 1 DOM with support for XML namespaces. The ! \function{parse()} and \function{parseString()} convenience ! functions are provided for generating a DOM tree: ! ! \begin{verbatim} ! from xml.dom import minidom ! doc = minidom.parse('hamlet.xml') ! \end{verbatim} ! \code{doc} is a \class{Document} instance. \class{Document}, like all ! the other DOM classes such as \class{Element} and \class{Text}, is a ! subclass of the \class{Node} base class. All the nodes in a DOM tree ! therefore support certain common methods, such as \method{toxml()} ! which returns a string containing the XML representation of the node ! and its children. Each class also has special methods of its own; for ! example, \class{Element} and \class{Document} instances have a method ! to find all child elements with a given tag name. Continuing from the ! previous 2-line example: + \begin{verbatim} + perslist = doc.getElementsByTagName( 'PERSONA' ) + print perslist[0].toxml() + print perslist[1].toxml() + \end{verbatim} + + For the \textit{Hamlet} XML file, the above few lines output: + + \begin{verbatim} + <PERSONA>CLAUDIUS, king of Denmark. </PERSONA> + <PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA> + \end{verbatim} + + The root element of the document is available as + \code{doc.documentElement}, and its children can be easily modified + by deleting, adding, or removing nodes: + + \begin{verbatim} + root = doc.documentElement + + # Remove the first child + root.removeChild( root.childNodes[0] ) + + # Move the new first child to the end + root.appendChild( root.childNodes[0] ) + + # Insert the new first child (originally, + # the third child) before the 20th child. + root.insertBefore( root.childNodes[0], root.childNodes[20] ) + \end{verbatim} + + Again, I will refer you to the Python documentation for a complete + listing of the different \class{Node} classes and their various methods. + + \subsection{Relationship to PyXML} + + The XML Special Interest Group has been working on XML-related Python + code for a while. Its code distribution, called PyXML, is available + from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}. + The PyXML distribution also used the package name \samp{xml}. If + you've written programs that used PyXML, you're probably wondering + about its compatibility with the 2.0 \module{xml} package. + + The answer is that Python 2.0's \module{xml} package isn't compatible + with PyXML, but can be made compatible by installing a recent version + PyXML. Many applications can get by with the XML support that is + included with Python 2.0, but more complicated applications will + require that the full PyXML package will be installed. When + installed, PyXML versions 0.6.0 or greater will replace the + \module{xml} package shipped with Python, and will be a strict + superset of the standard package, adding a bunch of additional + features. Some of the additional features in PyXML include: + + \begin{itemize} + \item 4DOM, a full DOM implementation + from FourThought LLC. + \item The xmlproc validating parser, written by Lars Marius Garshol. + \item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh. + \end{itemize} + % ====================================================================== \section{Module changes} *************** *** 982,985 **** --- 1136,1141 ---- and \module{nntplib}. Consult the CVS logs for the exact patch-by-patch details. + + % XXX gettext support Brian Gallew contributed OpenSSL support for the \module{socket} |