|
From: A.M. K. <aku...@us...> - 2001-09-20 21:43:06
|
Update of /cvsroot/py-howto/pyhowto
In directory usw-pr-cvs1:/tmp/cvs-serv5046
Modified Files:
xml-howto.tex
Log Message:
Massive rewriting and expansion of the introductory section
Index: xml-howto.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/xml-howto.tex,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -r1.13 -r1.14
*** xml-howto.tex 2001/09/20 21:19:05 1.13
--- xml-howto.tex 2001/09/20 21:43:03 1.14
***************
*** 38,159 ****
reasonably simple to implement and use, and is already being used for
specifying markup languages for various new standards: MathML for
! expressing mathematical equations, Synchronized Multimedia
! Integration Language for
! multimedia presentations, and so forth.
SGML and XML represent a document by tagging the document's various
! components with their function, or meaning. For example, an academic
! paper contains several parts: it has a title, one or more authors, an
! abstract, the actual text of the paper, a list of references, and so
! forth. A markup languge for writing such papers would therefore have
! tags for indicating what the contents of the abstract are, what the
! title is, and so forth. This should not be confused with the physical
! details of how the document is actually printed on paper. The
! abstract might be printed with narrow margins in a smaller font than
! the rest of the document, but the markup usually won't be concerned
! with details such as this; other software will translate from the
! markup language to a typesetting language such as \TeX, and will
! handle the details.
A markup language specified using XML looks a lot like HTML; a
document consists of a single \dfn{element}, which contains
sub-elements, which can have further sub-elements inside them.
Elements are indicated by \dfn{tags} in the text. Tags are always
! inside angle brackets \code{<}~\code{>}. There are two forms of
! elements. An element can contain content between opening and closing
tags, as in \code{<name>Euryale</name>}, which is a \element{name}
element containing the data \samp{Euryale}. This content may be text
! data, other XML elements, or a mixture of both. Elements can also be
! empty, in which case they contain nothing, and are represented as a
! single tag ended with a slash, as in \code{<stop/>}, which is an empty
! \element{stop} element. Unlike HTML, XML element names are
! case-sensitive; \element{stop} and \element{Stop} are two different
! element types.
Opening and empty tags can also contain attributes, which specify
! values associated with an element. For example, text such as
\code{<name lang='greek'>Herakles</name>}, the \element{name} element
has a \attribute{lang} attribute which has a value of \samp{greek}.
! This would contrast with \code{<name lang='latin'>Hercules</name>},
! where the attribute's value is \samp{latin}.
! A given XML language is specified with a Document Type Definition, or
! \dfn{DTD}. The DTD declares the element names that are allowed, and
! how elements can be nested inside each other. The DTD also specifies
! the attributes that can be provided for each element, their default
! values, and if they can be omitted. For example, to take an example
! from HTML, the \element{LI} element, representing an entry in a list,
! can only occur inside certain elements which represent lists, such as
! \element{OL} or \element{UL}. A \dfn{validating parser} can be given
! a DTD and a document, and verify whether a given document is legal
! according to the DTD's rules, or determine that one or more rules have
! been violated.
!
! Applications that process XML can be classed into two types. The
! simplest class is an application that only handles one particular
! markup language. For example, a chemistry program may only need to
! process Chemical Markup Language, but not MathML. This
! application can therefore be written specifically for a single DTD,
! and doesn't need to be capable of handling multiple markup
! languages. This type is simpler to write, and can easily be
! implemented with the available Python software.
!
! The second type of application is less common, and has to be able to
! handle any markup language you throw at it. An example might be a
! smart XML editor that helps you to write XML that conforms to a
! selected DTD; it might do so by not letting you enter an element where
! it would be illegal, or by suggesting elements that can be placed at
! the current cursor location. Such an application needs to handle any
! possible XML-defined markup, and therefore must be able to obtain a
! data structure embodying the DTD in use. XXX This type of application
! can't currently be implemented in Python without difficulty (XXX but
! wait and see if a DTD module is included...)
! For the full details of XML's syntax, the one definitive source is the
! XML 1.0 specification, available on the Web at
\url{http://www.w3.org/TR/xml-spec.html}. However, like all
! specifications, it's quite formal and isn't intended to be a friendly
introduction or a tutorial. The annotated version of the standard, at
! \url{http://www.xml.com/xml/pub/axml/axmlintro.html}, is quite helpful
! in clarifying the specification's intent. There are also various
! informal tutorials and books available to introduce you to XML.
!
! The rest of this HOWTO will assume that you're familiar with the
! relevant terminology. Most sections will use XML terms such as
! \emph{element} and \emph{attribute}; section~\ref{DOM} on the Document
! Object Model will assume that you've read the relevant Working Draft,
! and are familiar with things like Iterators and Nodes.
! Section~\ref{SAX} does not require that you have experience with the
! Java SAX implentations.
! \subsection{Related Links}
! \begin{seealso}
! \seetitle[http://www.w3.org/XML/]{Extensible Markup Language (XML)}
! {The World Wide Web Consortium's main page leading to
! documents relating to XML. Start here for background
! information and specifications.}
! \seetitle[http://www.oasis-open.org/cover/sgml-xml.html]{The XML
! Cover Pages}{Perhaps the leading index of information on
! XML and SGML.}
! \end{seealso}
\section{Installing the XML Toolkit}
! Windows users should get the precompiled version at
! \url{http://sourceforge.net/projects/pyxml}; Mac users will use the
! corresponding precompiled version at \url{XXX}. Linux users may wish
! to use either the Debian package from \url{XXX}, or the RPM from
! \url{http://sourceforge.net/projects/pyxml}. To compile from source
! on a \UNIX{} platform, simply perform the following steps.
\begin{enumerate}
- \item If you have are using Python 1.5, you need to install the
- distutils first, which are available from
- \url{http://www.python.org/sigs/distutils-sig}. Python 1.6 and later
- already includes the distutils, so you can skip this step.
! \item Get a copy of the source distribution from
\url{http://sourceforge.net/projects/pyxml}. Unpack it with the
following command.
--- 38,318 ----
reasonably simple to implement and use, and is already being used for
specifying markup languages for various new standards: MathML for
! expressing mathematical equations, Synchronized Multimedia Integration
! Language for multimedia presentations, and so forth.
SGML and XML represent a document by tagging the document's various
! components with their function or meaning. For example, a book
! contains several parts: it has a title, one or more authors, the text
! of the book, perhaps a preface or an index, and so forth. A markup
! languge for writing books would therefore have elements indicating
! what the contents of the preface are, what the title is, and so forth.
! This should not be confused with the physical details of how the
! document is actually printed on paper. The index might be printed
! with narrow margins in a smaller font than the rest of the book, but
! markup usually isn't (or shouldn't be, anyway) concerned with details
! such as this. Instead, other software will translate from the markup
! language to a typesetting language such as \TeX, handling the
! presentation details.
!
! This section will provide a brief overview of XML and a few related
! standards, but it's far from being complete because making it complete
! would require a full-length book and not a short HOWTO. There's no
! better way to get a completely accurate description than to read the
! original W3C Recommendations; you can find links to them in
! section~\ref{xml-links}, ``Related Links''. If you already know what
! XML is, you can skip the rest of this section.
!
! Later sections of this HOWTO assume that you're familiar with XML
! terminology. Most sections will use XML terms such as \emph{element}
! and \emph{attribute}. Section~\ref{SAX} does not require that you
! have experience with the Java SAX implentations.
+
+ \subsection{Elements, Attributes and Entities}
+
A markup language specified using XML looks a lot like HTML; a
document consists of a single \dfn{element}, which contains
sub-elements, which can have further sub-elements inside them.
Elements are indicated by \dfn{tags} in the text. Tags are always
! inside angle brackets \code{<}~\code{>}. Elements can either contain content, or they can be empty:
!
! \begin{itemize}
!
! \item An element can contain content between opening and closing
tags, as in \code{<name>Euryale</name>}, which is a \element{name}
element containing the data \samp{Euryale}. This content may be text
! data, other XML elements, or a mixture of both.
!
! \item Elements can also be empty, containing nothing, and are
! represented as a single tag ended with a slash. For example,
! \code{<stop/>} is an empty \element{stop} element. Unlike HTML, XML
! element names are case-sensitive; \element{stop} and \element{Stop}
! are two different elements.
!
! \end{itemize}
Opening and empty tags can also contain attributes, which specify
! values associated with an element. For example, in the XML text
\code{<name lang='greek'>Herakles</name>}, the \element{name} element
has a \attribute{lang} attribute which has a value of \samp{greek}.
! Contrast this with \code{<name lang='latin'>Hercules</name>}, where
! the attribute's value is \samp{latin}.
!
! XML also includes \dfn{entities} as a shorthand for including a
! particular character or a longer string. Entity references always
! begin with a \samp{\&} and end with a \samp{;}. For example, a
! particular Unicode character can be written as \code{\&\#4660;} using
! its character code in decimal, or as \code{\&\#x1234;} using
! hexadecimal. It's also possible to define your own entities, making
! \code{\&title;} expand to ``The Odyssey'', for example. If you want to
! include the \samp{\&} character in XML content, it must be written as
! \code{\&}.
!
! \subsection{Well-Formed XML}
!
! A legal XML document must, as a minimum, be \dfn{well-formed}: each
! open tag must have a corresponding closing tag, and tags must nest
! properly. For example, \code{<b><i>text</b></i>} is not well-formed
! because the \element{i} element should be enclosed inside the
! \element{b} element, but instead the closing \code{</b>} tag is
! encountered first. This example can be made well-formed by swapping
! the order of the closing tags, resulting in \code{<b><i>text</i></b>}.
!
! If you've ever written HTML by hand, you may have acquired the habit
! of being a bit sloppy about this. Strictly speaking, HTML has exactly
! the same rules about nesting tags as XML, but most Web browsers are
! very forgiving of errors in HTML. This is convenient for HTML
! authors, but it makes it difficult to write programs to parse HTML
! input, because the programs have to cope with all sorts of malformed
! input.
!
! The authors of the XML specification didn't want XML to fall into the
! same trap, because it would make XML processing software much harder
! to write. Therefore, all XML parsers have to be strict and must
! report an error if their input isn't well-formed. The Expat parser
! includes an executable program named \program{xmlwf} that parses the
! contents of files and reports any well-formedness violations; it's
! very handy for checking XML data that's been output from a program or
! written by hand.
!
!
! \subsection{DTDs}
!
! Well-formedness just says that all tags nest properly and that every
! opening tag is matched by a closing tag. It says nothing about the
! order of elements or about which elements can be contained inside other
! elements.
!
! The following XML, apparently representing a book, is well-formed but
! it makes no logical sense:
!
! \begin{verbatim}
! <book>
! <index> ... </index>
! <chapter> ... </chapter>
! <chapter> ... </chapter>
! <abstract> ... </abstract>
! <chapter> ... </chapter>
! <preface> .... </preface>
! </book>
! \end{verbatim}
! Prefaces don't come at the end of books, the index doesn't belong at
! the front, and the abstract doesn't belong in the middle.
! Well-formedness alone doesn't provide any way of enforcing that order.
! You could write a Python program that took an XML file like this and
! checked whether all the parts are in order, but then someone wanting
! to understand what documents are legal would have to read your program.
!
! Document Type Definitions, or \dfn{DTDs} for short, are a more concise
! way of enforcing ordering and nesting rules. A DTD declares the
! element names that are allowed, and how elements can be nested inside
! each other. To take an example from HTML, the \element{LI} element,
! representing an entry in a list, can only occur inside certain
! elements which represent lists, such as \element{OL} or \element{UL}.
!
! The DTD also specifies the attributes that can be provided for each
! element, the default value for each attribute, and whether the
! attribute can be omitted. A \dfn{validating parser} can take a
! document and a DTD, and check whether the document is legal according
! to the DTD's rules.
!
! Note that it's quite possible to get useful work done without using a
! validating parser and writing a DTD. You might decide that just
! writing well-formed XML and checking it with a Python program is all
! you need.
!
! A DTD lists the supported elements, the order in which elements must
! occur, and the possible attributes for each element. Here's a
! fragment from an imaginary DTD for writing books:
! \begin{verbatim}
! <!ELEMENT book (abstract?, preface, chapter*, appendix?)>
! <!ELEMENT abstract ...>
! <!ELEMENT chapter ...>
! <!ATTLIST chapter id ID #REQUIRED
! title CDATA #IMPLIED>
! \end{verbatim}
!
! The first line declares the \element{book} element, and specifies the
! elements that can occur inside it and the order in which the
! subelements must be provided. DTDs borrow from regular expression
! notation in order to express how elements can be repeated; \samp{?}
! means an element must occur 0 or 1 times, \samp{*} is 0 or more times,
! and \samp{+} means the element must occur 1 or more times. For
! example, the \element{abstract} and \element{appendix} elements are
! optional inside a \element{book} element. Exactly one
! \element{preface} element has to be present, and it can be followed by
! any number of \element{chapter} elements; having no chapters at all
! would be legal.
!
! The \code{ATTLIST} declaration specifies attributes for the
! \element{chapter} element. Chapters can have two attributes,
! \attribute{id} and \attribute{title}. \attribute{title} contains
! character data (CDATA) and is optional (that's what \samp{\#IMPLIED}
! means, for obscure historical reasons). \attribute{id} must contain
! an ID value, and it's required and not optional.
!
! A validating parser could take this DTD and a sample document, and
! report whether the document is \dfn{valid} according to the rules of
! the DTD. A document is valid if all the elements occur in the right
! order, and in the right number of repetitions.
!
! \subsection{Related Links}
! \label{xml-links}
!
! For the full details of XML's syntax, the definitive source is the XML
! 1.0 specification, available on the Web at
\url{http://www.w3.org/TR/xml-spec.html}. However, like all
! specifications it's quite formal and isn't intended to be a friendly
introduction or a tutorial. The annotated version of the standard, at
! \url{http://www.xml.com/xml/pub/a/axml/axmlintro.html}, is quite helpful
! in clarifying the specification's intent. There are also many more
! informal tutorials and books available to introduce you to XML at
! greater length.
! The XML Cover Pages, at \url{http://xml.coverpages.org}, are an
! extensive collection of links to XML and SGML resources, including a
! news page that's updated every few days. If you can only remember one
! XML-related URL, remember this one. Cafe con Leche,
! at \url{http://www.ibiblio.org/xml/}, is another good resource.
!
! The xml-dev mailing list is a high-traffic list for implementation and
! development; see \url{http://www.xml.org/xml/xmldev.shtml} for
! archives and subscription information. Be warned: Some people might
! find the discussion too focused on inventing new standards and tools,
! not on applying existing standards.
!
!
! \section{Related Standards}
! XML 1.0 is the basic standard, but people have built many, \emph{many}
! additional standards and tools on top of XML or to be used with XML.
! This section will quickly introduce some of these related
! technologies, paying particular attention to those that are supported
! by the Python/XML package.
+ %\subsection{XPath and XPointer}
+ %XXX write section on XPath and XPointer
+
+ \subsection{XSLT}
+
+ XML documents are often transformed from one format to another. These
+ transformations can be minor, such as changing all \element{OL}
+ elements into \element{UL} elements, or major, such as translating a
+ DocBook document into HTML so it can be displayed in a Web browser.
+ You can write a separate Python program to do each transformation as
+ you need it, and at times that will be the most appropriate option,
+ but an alternative approach is to use XSL, the Extensible Stylesheet
+ Language.
+
+ XSL is really two standards: XSLT, XSL Transformations; and XSL-FO,
+ XSL Formatting Objects. XSLT is used much more often than XSL-FO,
+ because XSL-FO is intended primarily for rendering XML for printing
+ onto paper. XSLT is a general tool for transforming one XML document
+ into another document, and therefore can be used for more diverse
+ tasks.
+
+ To use XSLT, you have to write a \dfn{stylesheet}, which is itself an
+ XML document written in the XSLT DTD. The source document is turned
+ into a tree structure, and the stylesheet specifies the transformation
+ you want to perform by selecting some elements from the tree and
+ rearranging them.
+
+ The Python/XML package includes 4XSLT, an XSLT processor written by
+ Fourthought, Inc., letting you write Python programs that apply an
+ XSLT stylesheet to a document.
+
+ \subsubsection{Related Links}
+
+ The W3C's XSL page is at \url{http://www.w3.org/Style/XSL/}, and links
+ to the XSLT specifications and to friendlier tutorials.
+
+
+ %\subsection{XML Schemas}
+
+ % XXX write a section on XSchema
+
+
+ %\subsection{RDF}
+
+ % XXX write a section on RDF; point users toward Redfoot.
+
+
\section{Installing the XML Toolkit}
+
+ Releases are available from
+ \url{http://sourceforge.net/projects/pyxml/}.
+ Windows users should download the appropriate precompiled version.
+ Linux users can either download an RPM, or install from source. Users
+ on other platfoms have no choice but to install from source.
! To compile from source on a \UNIX{} platform, simply perform the
! following steps.
\begin{enumerate}
! \item Download the latest version of the source distribution from
\url{http://sourceforge.net/projects/pyxml}. Unpack it with the
following command.
***************
*** 163,200 ****
\end{verbatim}
! \item
! Run:
! \begin{verbatim}
! python setup.py install
! \end{verbatim}
!
! To properly execute this operation, a C compiler is required - the
! same that was used to build Python itself. On a Unix system, this
! operation may require superuser permissions. \code{setup.py} supports
! a number of different commands and options, invoke \code{setup.py}
! without any arguments to obtain help.
\end{enumerate}
If you have difficulty installing this software, send a problem report
! to <xm...@py...> describing the problem, or submit a bug report
at \url{http://sourceforget.net/projects/pyxml}.
There are various demonstration programs in the \file{demo/} directory
! of the source distribution. You may wish to look at them next to get
! an impression of what's possible with the XML tools, and as a source
of example code.
- % package layout
\subsection{Related Links}
-
- \begin{seealso}
- \seetitle[http://www.python.org/topics/xml/]{Python and XML
- Processing}{This is the starting point for Python-related
- XML topics; it is updated to refer to all software,
- mailing lists, documentation, etc.}
- \end{seealso}
\section{SAX: The Simple API for XML}
--- 322,359 ----
\end{verbatim}
! \item Run \code{python setup.py install}. In order to run this,
! you'll need to have a C compiler installed, and it should be the same
! one that was used to build your Python installation. On a Unix system,
! this operation may require superuser permissions. \code{setup.py}
! supports a number of different commands and options; invoke
! \code{setup.py} without any arguments to see a help message.
\end{enumerate}
+ If you're using Python 1.5, you'll have to install the Distutils
+ first, available from
+ \url{http://www.python.org/sigs/distutils-sig}. The PyXML package
+ still works with Python 1.5.2, but the Unicode support is much less
+ powerful. Versions of Python after 1.5.2 added very complete Unicode
+ support, so if you're going to be doing serious XML work you should
+ use the latest version of Python 2.x. At this writing, the latest
+ released version is Python 2.1.
+
If you have difficulty installing this software, send a problem report
! to the XML-SIG mailing list describing the problem, or submit a bug report
at \url{http://sourceforget.net/projects/pyxml}.
There are various demonstration programs in the \file{demo/} directory
! of the Python/XML source distribution. You may wish to look at them
! to get an idea of what's possible with the XML tools, and as a source
of example code.
\subsection{Related Links}
+ The Python/XML Topic Guide, at
+ \url{http://pyxml.sourceforge.net/topics/} is the starting point for
+ Python-related XML topics; it links to software, mailing lists,
+ documentation, etc.
\section{SAX: The Simple API for XML}
|