From: A.M. K. <aku...@us...> - 2001-09-20 21:19:07
|
Update of /cvsroot/py-howto/pyhowto In directory usw-pr-cvs1:/tmp/cvs-serv32602 Modified Files: xml-howto.tex Log Message: Note things that need to be written Revise abstract Change author & e-mail address Rewrite the DOM section somewhat (it still needs a lot of work) Delete glossary Index: xml-howto.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/xml-howto.tex,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -r1.12 -r1.13 *** xml-howto.tex 2001/09/20 03:04:32 1.12 --- xml-howto.tex 2001/09/20 21:19:05 1.13 *************** *** 1,4 **** --- 1,10 ---- \documentclass{howto} + % $Id$ + + % XXX not covered: c14n.py, xml.marshal, scripts, namespaces, TREX/schemas, + % XSLT, XPath + % XXX overview of parsers + \newcommand{\element}[1]{\code{#1}} \newcommand{\attribute}[1]{\code{#1}} *************** *** 8,13 **** \release{0.6.1} ! \author{The Python/XML Special Interest Group} ! \authoraddress{\email{xm...@py...}\break (edited by \email{am...@bi...})} \begin{document} --- 14,19 ---- \release{0.6.1} ! \author{A.M. Kuchling} ! \authoraddress{\email{aku...@me...}} \begin{document} *************** *** 16,20 **** \begin{abstract} \noindent ! XML is the eXtensible Markup Language, a subset of SGML, intended to allow the creation and processing of application-specific markup languages. Python makes an excellent language for processing XML --- 22,26 ---- \begin{abstract} \noindent ! XML is the eXtensible Markup Language, a subset of SGML intended to allow the creation and processing of application-specific markup languages. Python makes an excellent language for processing XML *************** *** 22,28 **** assumes you're already familiar with the structure and terminology of XML. - - This is a draft document; 'XXX' in the text indicates that something - has to be filled in later, or rewritten, or verified, or something. \end{abstract} --- 28,31 ---- *************** *** 613,640 **** The Document Object Model specifies a tree-based representation for an ! XML document. A top-level \class{Document} instance is the root of ! the tree, and has a single child which is the top-level ! \class{Element} instance; this \class{Element} has children nodes ! representing the content and any sub-elements, which may have further ! children, and so forth. Functions are defined which let you traverse ! the resulting tree any way you like, access element and attribute ! values, insert and delete nodes, and convert the tree back into XML. ! ! The DOM is useful for modifying XML documents, because you can create ! a DOM tree, modify it by adding new nodes and moving subtrees around, ! and then produce a new XML document as output. You can also construct ! a DOM tree yourself, and convert it to XML; this is often a more ! flexible way of producing XML output than simply writing ! \code{<tag1>}...\code{</tag1>} to a file. ! ! While the DOM doesn't require that the entire tree be resident in ! memory at one time, the Python DOM implementation currently does keep ! the whole tree in RAM. It's possible to write an implementation that ! stores most of the tree on disk or in a database, and reads in new ! sections as they're accessed, but this hasn't been done yet. ! This means you may not have enough memory to process very large ! documents as a DOM tree. A SAX handler, on the other hand, can ! potentially churn through amounts of data far larger than the ! available RAM. \subsection{Getting A DOM Tree} --- 616,642 ---- The Document Object Model specifies a tree-based representation for an ! XML document, as opposed to the event-driven processing provided by ! SAX. Both approaches have their uses. ! ! A top-level \class{Document} instance is the root of the tree, and has ! a single child which is the top-level \class{Element} instance; this ! \class{Element} has child nodes representing the content and any ! sub-elements, which may in turn have further children and so forth. ! There are different classes for everything that can be found in an XML ! document, so in addition to the \class{Element} class, there are also ! classes such as \class{Text}, \class{Comment}, \class{CDATASection}, ! \class{EntityReference}, and so on. Tree nodes provide methods for ! accessing the parent and child nodes, accessing element and attribute ! values, insert and delete nodes, and converting the tree back into XML. ! ! The DOM is often useful for modifying XML documents, because you can ! create a DOM tree, modify it by adding new nodes and moving subtrees ! around, and then produce a new XML document as output. On the other ! hand, while the DOM doesn't require that the entire tree be resident ! in memory at one time, the Python DOM implementation currently keeps ! the whole tree in RAM. This means you may not have enough memory to ! process very large documents as a DOM tree. A SAX handler, on the ! other hand, can potentially churn through amounts of data far larger ! than the available RAM. \subsection{Getting A DOM Tree} *************** *** 643,649 **** offers two alternative implementations of the DOM, \module{xml.dom.minidom} and \code{4DOM}. \module{xml.dom.minidom} is ! included in Python 2. It is a minimalistic implementation, which means ! it does not provide all interfaces and operations required by the DOM ! standard. \code{4DOM} (XXX reference) is a complete implementation of DOM Level 2 (which is currently work in progress), so we will use that in the examples. --- 645,651 ---- offers two alternative implementations of the DOM, \module{xml.dom.minidom} and \code{4DOM}. \module{xml.dom.minidom} is ! included in Python 2. It is a minimal implementation, which means it ! does not provide all interfaces and operations required by the DOM ! standard. \code{4DOM} (XXX reference) is a complete implementation of DOM Level 2 (which is currently work in progress), so we will use that in the examples. *************** *** 655,663 **** input (a file-like object, a string, a file name, and a URL, respectively). They all return a DOM \class{Document} object. \begin{verbatim} import sys ! from xml.dom.ext.reader.Sax import FromXmlStream ! from xml.dom.ext import PrettyPrint # parse the document --- 657,665 ---- input (a file-like object, a string, a file name, and a URL, respectively). They all return a DOM \class{Document} object. + % XXX these functions seem to be deprecated! why? \begin{verbatim} import sys ! from xml.dom.ext.reader.Sax2 import FromXmlStream # parse the document *************** *** 669,675 **** This HOWTO can't be a complete introduction to the Document Object Model, because there are lots of interfaces and lots of ! methods. Luckily, the DOM Recommendation is quite a readable document, ! so I'd recommend that you read it to get a complete picture of the ! available interfaces; this will only be a partial overview. The Document Object Model represents a XML document as a tree of --- 671,678 ---- This HOWTO can't be a complete introduction to the Document Object Model, because there are lots of interfaces and lots of ! methods. Luckily, the DOM Recommendation is quite readable as ! specifications go, so I'd recommend that you read it to get a complete ! picture of the available interfaces. This section will only be a ! partial overview. The Document Object Model represents a XML document as a tree of *************** *** 678,682 **** \class{Text}, and \class{Comment}. ! We'll use a single example document throughout this section. Here's the sample: \begin{verbatim} --- 681,686 ---- \class{Text}, and \class{Comment}. ! We'll use a single example document throughout this section. Here's ! the sample: \begin{verbatim} *************** *** 720,730 **** This isn't the only possible tree, because different parsers may differ in how they generate \class{Text} nodes; any of the ! \class{Text} nodes in the above tree might be split into multiple nodes.) \subsubsection{The \class{Node} class} ! We'll start by considering the basic \class{Node} class. All the ! other DOM nodes --- \class{Document}, \class{Element}, \class{Text}, ! and so forth --- are subclasses of \class{Node}. It's possible to perform many tasks using just the interface provided by \class{Node}. --- 724,735 ---- This isn't the only possible tree, because different parsers may differ in how they generate \class{Text} nodes; any of the ! \class{Text} nodes in the above tree might be split into multiple ! nodes.) \subsubsection{The \class{Node} class} ! e'll start by considering the basic \class{Node} class. All the ! other DOM nodes---\class{Document}, \class{Element}, \class{Text}, ! and so forth---are subclasses of \class{Node}. It's possible to perform many tasks using just the interface provided by \class{Node}. *************** *** 763,767 **** \member{documentElement} attribute contains the \class{Element} node for the root element. The \class{Document} node may have additional ! children, such as \class{ProcessingInstruction} nodes; the complete list of children XXX. --- 768,773 ---- \member{documentElement} attribute contains the \class{Element} node for the root element. The \class{Document} node may have additional ! children, such as \class{ProcessingInstruction} nodes; the complete ! list of children XXX. *************** *** 775,854 **** Introduction to the walker class - \subsection{Building A Document} - - Intro to builder - - \subsection{Processing HTML} - - Intro to HTML builder - - %Explanations, sample code, ... - \subsection{Related Links} - - \begin{seealso} - \seetitle[http://www.w3.org/DOM/]{Document Object Model (DOM)}{The - World Wide Web Consortium's DOM page.} - \seetitle[http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/] - {Document Object Model (DOM) Level 1 Specification}{The DOM - Level 1 Recommendation. Unlike most standards, this one - is actually pretty readable, particularly if you're only - interested in the Core XML interfaces.} - \end{seealso} - - - \section{Glossary \label{glossary}} - - XML has given rise to a sea of acronyms and terms. This section will - list the most significant terms, and sketch their relevance. - - Many of the following definitions are taken from Lars Marius Garshol's - SGML glossary, at \url{http://www.stud.ifi.uio.no/\~larsga/download/diverse/sgmlglos.html}. ! \begin{definitions} ! \term{DOM (Document Object Model)} ! % ! The Document Object Model is intended to be a platform- and ! language-neutral interface that will allow programs and scripts to ! dynamically access and update the content, structure and style of ! documents. Documents will be represented as tree structures which can ! be traversed and modified. ! ! \term{DTD (Document Type Definition)} ! % ! A Document Type Definition (nearly always called DTD) defines ! an XML document type, complete with element types, entities ! and an XML declaration. ! ! In other words: a DTD completely describes one particular kind ! of XML document, such as, for instance, HTML 3.2. ! ! \term{SAX (Simple API for XML)} ! % ! SAX is a simple standardized API for XML parsers developed by the ! contributors to the xml-dev mailing list. The interface is mostly ! language-independent, as long as the language is object-oriented; the ! first implementation was written for Java, but a Python implementation ! is also available. SAX is supported by many XML parsers. ! ! \term{XML (eXtensible Markup Language)} ! % ! XML is an SGML application profile specialized for use on the ! web and has its own standards for linking and stylesheets under development. ! ! %XML-Data ! ! \term{XSL (eXtensible Style Language)} ! % ! XSL is a proposal for a stylesheet language for XML, which ! enables browsers to lay out XML documents in an attractive ! manner, and also provides a way to convert XML documents to ! HTML. ! \end{definitions} ! ! %\section{Related Links} ! % ! %This section collects all ! %the links from the preceding sections. \end{document} --- 781,797 ---- Introduction to the walker class \subsection{Related Links} ! The DOM Level 1 Recommendation is at ! \url{http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/}. Unlike ! most standards, this one is actually pretty readable, particularly if ! you're only interested in the Core XML interfaces. ! ! Level 2 of the DOM has also been defined, adding more specialized ! features such as support for XML namespaces, events, and ranges. DOM ! Level 3 is still being worked on, and will add yet more features. The ! World Wide Web Consortium's DOM page at \url{http://www.w3.org/DOM/} ! has pointers to the specifications and current drafts for these higher ! DOM levels. \end{document} |