|
From: A.M. K. <aku...@us...> - 2001-09-20 21:19:07
|
Update of /cvsroot/py-howto/pyhowto
In directory usw-pr-cvs1:/tmp/cvs-serv32602
Modified Files:
xml-howto.tex
Log Message:
Note things that need to be written
Revise abstract
Change author & e-mail address
Rewrite the DOM section somewhat (it still needs a lot of work)
Delete glossary
Index: xml-howto.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/xml-howto.tex,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -r1.12 -r1.13
*** xml-howto.tex 2001/09/20 03:04:32 1.12
--- xml-howto.tex 2001/09/20 21:19:05 1.13
***************
*** 1,4 ****
--- 1,10 ----
\documentclass{howto}
+ % $Id$
+
+ % XXX not covered: c14n.py, xml.marshal, scripts, namespaces, TREX/schemas,
+ % XSLT, XPath
+ % XXX overview of parsers
+
\newcommand{\element}[1]{\code{#1}}
\newcommand{\attribute}[1]{\code{#1}}
***************
*** 8,13 ****
\release{0.6.1}
! \author{The Python/XML Special Interest Group}
! \authoraddress{\email{xm...@py...}\break (edited by \email{am...@bi...})}
\begin{document}
--- 14,19 ----
\release{0.6.1}
! \author{A.M. Kuchling}
! \authoraddress{\email{aku...@me...}}
\begin{document}
***************
*** 16,20 ****
\begin{abstract}
\noindent
! XML is the eXtensible Markup Language, a subset of SGML, intended to
allow the creation and processing of application-specific markup
languages. Python makes an excellent language for processing XML
--- 22,26 ----
\begin{abstract}
\noindent
! XML is the eXtensible Markup Language, a subset of SGML intended to
allow the creation and processing of application-specific markup
languages. Python makes an excellent language for processing XML
***************
*** 22,28 ****
assumes you're already familiar with the structure and terminology of
XML.
-
- This is a draft document; 'XXX' in the text indicates that something
- has to be filled in later, or rewritten, or verified, or something.
\end{abstract}
--- 28,31 ----
***************
*** 613,640 ****
The Document Object Model specifies a tree-based representation for an
! XML document. A top-level \class{Document} instance is the root of
! the tree, and has a single child which is the top-level
! \class{Element} instance; this \class{Element} has children nodes
! representing the content and any sub-elements, which may have further
! children, and so forth. Functions are defined which let you traverse
! the resulting tree any way you like, access element and attribute
! values, insert and delete nodes, and convert the tree back into XML.
!
! The DOM is useful for modifying XML documents, because you can create
! a DOM tree, modify it by adding new nodes and moving subtrees around,
! and then produce a new XML document as output. You can also construct
! a DOM tree yourself, and convert it to XML; this is often a more
! flexible way of producing XML output than simply writing
! \code{<tag1>}...\code{</tag1>} to a file.
!
! While the DOM doesn't require that the entire tree be resident in
! memory at one time, the Python DOM implementation currently does keep
! the whole tree in RAM. It's possible to write an implementation that
! stores most of the tree on disk or in a database, and reads in new
! sections as they're accessed, but this hasn't been done yet.
! This means you may not have enough memory to process very large
! documents as a DOM tree. A SAX handler, on the other hand, can
! potentially churn through amounts of data far larger than the
! available RAM.
\subsection{Getting A DOM Tree}
--- 616,642 ----
The Document Object Model specifies a tree-based representation for an
! XML document, as opposed to the event-driven processing provided by
! SAX. Both approaches have their uses.
!
! A top-level \class{Document} instance is the root of the tree, and has
! a single child which is the top-level \class{Element} instance; this
! \class{Element} has child nodes representing the content and any
! sub-elements, which may in turn have further children and so forth.
! There are different classes for everything that can be found in an XML
! document, so in addition to the \class{Element} class, there are also
! classes such as \class{Text}, \class{Comment}, \class{CDATASection},
! \class{EntityReference}, and so on. Tree nodes provide methods for
! accessing the parent and child nodes, accessing element and attribute
! values, insert and delete nodes, and converting the tree back into XML.
!
! The DOM is often useful for modifying XML documents, because you can
! create a DOM tree, modify it by adding new nodes and moving subtrees
! around, and then produce a new XML document as output. On the other
! hand, while the DOM doesn't require that the entire tree be resident
! in memory at one time, the Python DOM implementation currently keeps
! the whole tree in RAM. This means you may not have enough memory to
! process very large documents as a DOM tree. A SAX handler, on the
! other hand, can potentially churn through amounts of data far larger
! than the available RAM.
\subsection{Getting A DOM Tree}
***************
*** 643,649 ****
offers two alternative implementations of the DOM,
\module{xml.dom.minidom} and \code{4DOM}. \module{xml.dom.minidom} is
! included in Python 2. It is a minimalistic implementation, which means
! it does not provide all interfaces and operations required by the DOM
! standard. \code{4DOM} (XXX reference) is a complete implementation of
DOM Level 2 (which is currently work in progress), so we will use that
in the examples.
--- 645,651 ----
offers two alternative implementations of the DOM,
\module{xml.dom.minidom} and \code{4DOM}. \module{xml.dom.minidom} is
! included in Python 2. It is a minimal implementation, which means it
! does not provide all interfaces and operations required by the DOM
! standard. \code{4DOM} (XXX reference) is a complete implementation of
DOM Level 2 (which is currently work in progress), so we will use that
in the examples.
***************
*** 655,663 ****
input (a file-like object, a string, a file name, and a URL,
respectively). They all return a DOM \class{Document} object.
\begin{verbatim}
import sys
! from xml.dom.ext.reader.Sax import FromXmlStream
! from xml.dom.ext import PrettyPrint
# parse the document
--- 657,665 ----
input (a file-like object, a string, a file name, and a URL,
respectively). They all return a DOM \class{Document} object.
+ % XXX these functions seem to be deprecated! why?
\begin{verbatim}
import sys
! from xml.dom.ext.reader.Sax2 import FromXmlStream
# parse the document
***************
*** 669,675 ****
This HOWTO can't be a complete introduction to the Document Object
Model, because there are lots of interfaces and lots of
! methods. Luckily, the DOM Recommendation is quite a readable document,
! so I'd recommend that you read it to get a complete picture of the
! available interfaces; this will only be a partial overview.
The Document Object Model represents a XML document as a tree of
--- 671,678 ----
This HOWTO can't be a complete introduction to the Document Object
Model, because there are lots of interfaces and lots of
! methods. Luckily, the DOM Recommendation is quite readable as
! specifications go, so I'd recommend that you read it to get a complete
! picture of the available interfaces. This section will only be a
! partial overview.
The Document Object Model represents a XML document as a tree of
***************
*** 678,682 ****
\class{Text}, and \class{Comment}.
! We'll use a single example document throughout this section. Here's the sample:
\begin{verbatim}
--- 681,686 ----
\class{Text}, and \class{Comment}.
! We'll use a single example document throughout this section. Here's
! the sample:
\begin{verbatim}
***************
*** 720,730 ****
This isn't the only possible tree, because different parsers may
differ in how they generate \class{Text} nodes; any of the
! \class{Text} nodes in the above tree might be split into multiple nodes.)
\subsubsection{The \class{Node} class}
! We'll start by considering the basic \class{Node} class. All the
! other DOM nodes --- \class{Document}, \class{Element}, \class{Text},
! and so forth --- are subclasses of \class{Node}. It's possible to
perform many tasks using just the interface provided by \class{Node}.
--- 724,735 ----
This isn't the only possible tree, because different parsers may
differ in how they generate \class{Text} nodes; any of the
! \class{Text} nodes in the above tree might be split into multiple
! nodes.)
\subsubsection{The \class{Node} class}
! e'll start by considering the basic \class{Node} class. All the
! other DOM nodes---\class{Document}, \class{Element}, \class{Text},
! and so forth---are subclasses of \class{Node}. It's possible to
perform many tasks using just the interface provided by \class{Node}.
***************
*** 763,767 ****
\member{documentElement} attribute contains the \class{Element} node
for the root element. The \class{Document} node may have additional
! children, such as \class{ProcessingInstruction} nodes; the complete list of children XXX.
--- 768,773 ----
\member{documentElement} attribute contains the \class{Element} node
for the root element. The \class{Document} node may have additional
! children, such as \class{ProcessingInstruction} nodes; the complete
! list of children XXX.
***************
*** 775,854 ****
Introduction to the walker class
- \subsection{Building A Document}
-
- Intro to builder
-
- \subsection{Processing HTML}
-
- Intro to HTML builder
-
- %Explanations, sample code, ...
-
\subsection{Related Links}
-
- \begin{seealso}
- \seetitle[http://www.w3.org/DOM/]{Document Object Model (DOM)}{The
- World Wide Web Consortium's DOM page.}
- \seetitle[http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/]
- {Document Object Model (DOM) Level 1 Specification}{The DOM
- Level 1 Recommendation. Unlike most standards, this one
- is actually pretty readable, particularly if you're only
- interested in the Core XML interfaces.}
- \end{seealso}
-
-
- \section{Glossary \label{glossary}}
-
- XML has given rise to a sea of acronyms and terms. This section will
- list the most significant terms, and sketch their relevance.
-
- Many of the following definitions are taken from Lars Marius Garshol's
- SGML glossary, at \url{http://www.stud.ifi.uio.no/\~larsga/download/diverse/sgmlglos.html}.
! \begin{definitions}
! \term{DOM (Document Object Model)}
! %
! The Document Object Model is intended to be a platform- and
! language-neutral interface that will allow programs and scripts to
! dynamically access and update the content, structure and style of
! documents. Documents will be represented as tree structures which can
! be traversed and modified.
!
! \term{DTD (Document Type Definition)}
! %
! A Document Type Definition (nearly always called DTD) defines
! an XML document type, complete with element types, entities
! and an XML declaration.
!
! In other words: a DTD completely describes one particular kind
! of XML document, such as, for instance, HTML 3.2.
!
! \term{SAX (Simple API for XML)}
! %
! SAX is a simple standardized API for XML parsers developed by the
! contributors to the xml-dev mailing list. The interface is mostly
! language-independent, as long as the language is object-oriented; the
! first implementation was written for Java, but a Python implementation
! is also available. SAX is supported by many XML parsers.
!
! \term{XML (eXtensible Markup Language)}
! %
! XML is an SGML application profile specialized for use on the
! web and has its own standards for linking and stylesheets under development.
!
! %XML-Data
!
! \term{XSL (eXtensible Style Language)}
! %
! XSL is a proposal for a stylesheet language for XML, which
! enables browsers to lay out XML documents in an attractive
! manner, and also provides a way to convert XML documents to
! HTML.
! \end{definitions}
!
! %\section{Related Links}
! %
! %This section collects all
! %the links from the preceding sections.
\end{document}
--- 781,797 ----
Introduction to the walker class
\subsection{Related Links}
! The DOM Level 1 Recommendation is at
! \url{http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/}. Unlike
! most standards, this one is actually pretty readable, particularly if
! you're only interested in the Core XML interfaces.
!
! Level 2 of the DOM has also been defined, adding more specialized
! features such as support for XML namespaces, events, and ranges. DOM
! Level 3 is still being worked on, and will add yet more features. The
! World Wide Web Consortium's DOM page at \url{http://www.w3.org/DOM/}
! has pointers to the specifications and current drafts for these higher
! DOM levels.
\end{document}
|