From: Grant M. <gr...@us...> - 2003-10-14 09:13:51
|
Update of /cvsroot/perl-xml/perl-xml-faq In directory sc8-pr-cvs1:/tmp/cvs-serv26784 Modified Files: faq-style.xsl faq.css perl-xml-faq.xml Log Message: - added XML::Validator::Schema section - updated XML::XSLT status - rewrote Perl 5.8 section in the present tense :-) - rewrote html form section - fixed DOCTYPE for DocBook 4.2 - corrected SYSTEM URI for use with catalog - added paragraph on pull parsing - numerous minor tweaks perl-xml-faq.xml Index: faq-style.xsl =================================================================== RCS file: /cvsroot/perl-xml/perl-xml-faq/faq-style.xsl,v retrieving revision 1.4 retrieving revision 1.5 diff -u -d -r1.4 -r1.5 --- faq-style.xsl 19 Jun 2002 21:30:29 -0000 1.4 +++ faq-style.xsl 14 Oct 2003 09:13:47 -0000 1.5 @@ -4,10 +4,23 @@ xmlns="http://www.w3.org/TR/xhtml1/transitional" exclude-result-prefixes="#default"> -<!-- This is where I chose to install the DocBook XSL Stylesheets --> -<!-- from: http://docbook.sourceforge.net/projects/xsl/index.html --> +<!-- -<xsl:import href="/usr/share/xml/docbook/xslt/html/docbook.xsl"/> + This stylesheet merely imports the Docbook XSL stylesheets and sets a few + parameters. Download the stylesheets from: + + http://docbook.sourceforge.net/projects/xsl/index.html + + Unpack them onto your system and set up a catalog entry to map the URI of + the 'current' release to the directory where you unpacked it, eg: + + <rewriteURI + uriStartString="http://docbook.sourceforge.net/release/xsl/current" + rewritePrefix="docbook-xsl-1.62.0" /> +--> + + +<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"/> <!-- Parameter settings --> @@ -21,8 +34,6 @@ </xsl:param> - - <!-- Templates to override defaults --> <xsl:template match="question/para"> @@ -33,11 +44,39 @@ <br /><b><xsl:value-of select="." /></b><br /> </xsl:template> -<!-- Why didn't this work? -<xsl:template match="revhistory"> - <p>Revision History Here</p> +<xsl:template match="revhistory" mode="titlepage.mode"> + <div class="revhistory"> + <p class="title"><b>Last updated:</b> + <xsl:text> </xsl:text> + <xsl:call-template name="monthname"> + <xsl:with-param name="monthnum" select="substring(./revision/date, 13, 2)"/> + </xsl:call-template> + <xsl:text> </xsl:text> + <xsl:value-of select="substring(./revision/date, 16, 2)" /> + <xsl:text>, </xsl:text> + <xsl:value-of select="substring(./revision/date, 8, 4)" /> + </p> + </div> +</xsl:template> + +<xsl:template name="monthname"> + <xsl:param name="monthnum">0</xsl:param> + <xsl:choose> + <xsl:when test="$monthnum = 1">January</xsl:when> + <xsl:when test="$monthnum = 2">February</xsl:when> + <xsl:when test="$monthnum = 3">March</xsl:when> + <xsl:when test="$monthnum = 4">April</xsl:when> + <xsl:when test="$monthnum = 5">May</xsl:when> + <xsl:when test="$monthnum = 6">June</xsl:when> + <xsl:when test="$monthnum = 7">July</xsl:when> + <xsl:when test="$monthnum = 8">August</xsl:when> + <xsl:when test="$monthnum = 9">September</xsl:when> + <xsl:when test="$monthnum = 10">October</xsl:when> + <xsl:when test="$monthnum = 11">November</xsl:when> + <xsl:when test="$monthnum = 12">December</xsl:when> + <xsl:otherwise><xsl:value-of select="$monthnum"/></xsl:otherwise> + </xsl:choose> </xsl:template> ---> </xsl:stylesheet> Index: faq.css =================================================================== RCS file: /cvsroot/perl-xml/perl-xml-faq/faq.css,v retrieving revision 1.2 retrieving revision 1.3 diff -u -d -r1.2 -r1.3 --- faq.css 17 Apr 2002 20:46:36 -0000 1.2 +++ faq.css 14 Oct 2003 09:13:47 -0000 1.3 @@ -1,45 +1,50 @@ BODY { background: #FFFFFF; font-family: Verdana, Arial, Helvetica, sans-serif; - font-size: 10pt; - font-weight: normal; -} - -TD { - background: #FFFFFF; - font-family: Verdana, Arial, Helvetica, sans-serif; - font-size: 10pt; + font-size: 90%; font-weight: normal; } - -TH { - background: #FFFFFF; - font-family: Verdana, Arial, Helvetica, sans-serif; - font-size: 10pt; - font-weight: bold; -} - -.programlisting { - padding-top: 10; - padding-left: 8; - background-color: #ffffe0; -} H1.title { - padding: 6; + padding: 0.2em; border-style: solid; - border-width: 2; + border-width: 2px; border-color: #eeeeee; } H3.title { - padding: 4; - margin-top: 20; + padding: 0.4em; + margin-top: 2em; border-style: solid; - border-width: 2; + border-width: 2px; border-color: #eeeeee; } +DIV.abstract { + margin: 2em; + padding: 1em; + background-color: #eeeeee; +} + +DIV.abstract P.title { + font-size: 120%; +} + DIV.revhistory { width: 400px; } + +TR.question TD { + padding-top: 1.0em; +} + +TT { + font-size: 120%; +} + +.programlisting { + padding-top: 0.8em; + padding-left: 0.8em; + background-color: #ffffe0; +} + Index: perl-xml-faq.xml =================================================================== RCS file: /cvsroot/perl-xml/perl-xml-faq/perl-xml-faq.xml,v retrieving revision 1.9 retrieving revision 1.10 diff -u -d -r1.9 -r1.10 --- perl-xml-faq.xml 19 Jun 2002 21:29:41 -0000 1.9 +++ perl-xml-faq.xml 14 Oct 2003 09:13:47 -0000 1.10 @@ -1,6 +1,6 @@ <?xml version="1.0" encoding="utf-8" standalone="no"?> -<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2b1//EN" - "file:///usr/share/xml/docbook/4.2b1/docbookx.dtd" +<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" > <article class="faq"> @@ -16,6 +16,7 @@ </author> <copyright> <year>2002</year> + <year>2003</year> <holder>Grant McLean</holder> </copyright> @@ -32,7 +33,7 @@ most common question for beginners - "Where do I start?"</para> <para>The official home for this document on the web is: - <ulink url="http://www.perlxml.net/perl-xml-faq.dkb">http://www.perlxml.net/perl-xml-faq.dkb</ulink>. + <ulink url="http://perl-xml.sourceforge.net/faq/">http://perl-xml.sourceforge.net/faq/</ulink>. The official source for this document is in CVS on <ulink url="http://www.sourceforge.net/">SourceForge</ulink> at <ulink url="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/perl-xml/perl-xml-faq/" @@ -365,6 +366,16 @@ stream style. You configure the parser and it gives you the document in chunks (bits of the tree or 'twigs').</para> + <para>Finally, the latest trendy buzzword in Java and C# circles is + 'pull' parsing (see <ulink url="http://www.xmlpull.org/" + >www.xmlpull.org</ulink>). Unlike SAX, which 'pushes' events at your + code, the pull paradigm allows your code to ask for the next bit when + it's ready. This approach is reputed to allow you to structure your code + more around the data rather than around the API. Eric Bohlman's + <classname>XML::TokeParser</classname> offers a simple but powerful + pull-based API on top of <classname>XML::Parser</classname>. There + are currently no Perl implementations of the XMLPULL API.</para> + </answer> </qandaentry> @@ -487,7 +498,7 @@ is no API for finding or transforming nodes. This module is also not suitable for working with 'mixed content'. <classname>XML::Simple</classname> has it's own <ulink - url="http://web.co.nz/~grantm/cpan/xmlsimple/faq.html">frequently asked + url="http://search.cpan.org/dist/XML-Simple/lib/XML/Simple/FAQ.pod">frequently asked questions</ulink> document.</para> <para>Although <classname>XML::Simple</classname> uses a tree-style, the @@ -887,13 +898,13 @@ </question> <answer> - <para>This module aims to implement XSLT in Perl, so the good news is - that so long as you have <classname>XML::Parser</classname> working you - won't need to compile anything to install this module. The bad news is - that it is not a complete implementation of the XSLT spec, it is still in - 'alpha' state and it's not clear whether it is under active development. - The <classname>XML::XSLT</classname> distribution includes a script you - can use from the command line like this:</para> + <para>This module aims to implement XSLT in Perl, so as long as you have + <classname>XML::Parser</classname> working you won't need to compile + anything to install it. The implementation is not complete, but work is + continuing and you can join the fun at the project's <ulink + url="http://xmlxslt.sourceforge.net/">SourceForge page</ulink>. The + <classname>XML::XSLT</classname> distribution includes a script you can + use from the command line like this:</para> <programlisting><![CDATA[ xslt-parser -s toc-links.xsl perl-xml-faq.xml > toc.html @@ -904,13 +915,6 @@ Introduction to Perl's XML::XSLT module</ulink> at <ulink url="http://www.linuxfocus.org/">linuxfocus.org</ulink>.</para> - <para>Some people have experienced difficulty installing the latest - version of this module - possibly since maintenance has been handled by - multiple people. At the time of writing, the latest version was - <filename>J/JS/JSTOWE/XML-XSLT-0.40.tar.gz</filename> although CPAN.pm - would only find - <filename>B/BR/BRONG/XML-XSLT-0.32.tar.gz</filename>.</para> - </answer> </qandaentry> @@ -1170,7 +1174,7 @@ <qandaentry id="utf_perl_5_6"> <question> - <para>What can Perl do with a UTF8 string?</para> + <para>What can Perl do with a UTF-8 string?</para> </question> <answer> @@ -1231,34 +1235,28 @@ <qandaentry id="utf_perl_5_8"> <question> - <para>What will Perl 5.8 do with a UTF8 string?</para> + <para>What can Perl 5.8 do with a UTF-8 string?</para> </question> <answer> - <para>The Unicode support in Perl 5.6 is not complete and many of the - shortcomings will be fixed in Perl 5.8. One major leap forward in 5.8 - will be the move to Perl IO and 'layers' which will allow translations to - take place as file handles are read from or written to. A built-in - layer called ':encoding' will automatically translate data to UTF-8 as it - is read, or to some other encoding as it is written. For example, given - a UTF-8 string, this code will write it out to a file as - ISO-8859-1:</para> + <para>The Unicode support in Perl 5.6 had a number of omissions and bugs. + Many of the shortcomings were fixed in Perl 5.8 and 5.8.1. One major + leap forward in 5.8 was the move to Perl IO and 'layers' which allows + translations to take place transparently as file handles are read from or + written to. A built-in layer called ':encoding' will automatically + translate data to UTF-8 as it is read, or to some other encoding as it is + written. For example, given a UTF-8 string, this code will write it out + to a file as ISO-8859-1:</para> <programlisting><![CDATA[ -open($fh,'>:encoding(iso-8859-1)', $path) || die "open($path): $!"; +open($fh,'>:encoding(iso-8859-1)', $path) or die "open($path): $!"; $fh->print($utf_string); ]]></programlisting> - <para>File handle operations will also be applicable to in-memory 'files' - held in Perl scalars.</para> - - <para>New built-in functions will allow you to check the utf8 flag - on scalars and convert utf-8 strings to and from byte strings.</para> - - <para>The core 5.8 distribution will also include a number of new modules - in the Unicode:: namespace. Supported operations will include querying - the Unicode Character Database, sorting using Unicode collating rules - and normalising Unicode character forms.</para> + <para>The new core module 'Encode' can be used to translate between + encodings (but since that usually only makes sense during IO, you might + as well just use layers) and also provides the 'is_utf8' function for + accessing the UTF-8 flag on a string.</para> </answer> </qandaentry> @@ -1322,9 +1320,8 @@ <formalpara> <title>Perl 5.8 IO layers</title> - <para>At the time of writing this document, Perl 5.8 had not been - released but when it is you'll be able to specify an encoding - translation layer as you open a file like this:</para> + <para>You can specify an encoding translation layer as you open a file + like this:</para> </formalpara> @@ -1333,8 +1330,8 @@ $fh->print($utf_string); ]]></programlisting> - <para>You'll also be able to push an encoding layer onto an already - open filehandle like this:</para> + <para>You can also push an encoding layer onto an already open filehandle + like this:</para> <programlisting><![CDATA[ binmode(STDOUT, ':encoding(windows-1250)'); @@ -1502,13 +1499,15 @@ control characters with printable characters. For strict Latin1 text it shouldn't matter, but if your text contains 'smart quotes', daggers, bullet characters, the Trade Mark or the Euro symbols it's not - iso-8859-1.</para> + iso-8859-1. <classname>XML::Parser</classname> version 2.32 and later + include a CP1252 mapping which can be used with documents bearing this + declaration:</para> </formalpara> -<!-- - <para>FIXME: Is there a cp1252 encoding map?</para> ---> + <programlisting><![CDATA[ +<?xml version='1.0' encoding='WINDOWS-1252' ?> + ]]></programlisting> </answer> @@ -1517,27 +1516,34 @@ <formalpara> <title>Web Forms</title> - <para>If your script accepts text from a web form, you have no way of - knowing what encoding the client system was using. If you save the data - to an XML file, random high characters in the data may cause the file to - not be 'well-formed'.</para> + <para>If your Perl script accepts text from a web form, you are at the + mercy of the client browser as to what encoding is used - if you save the + data to an XML file, random high characters in the data may cause the + file to not be 'well-formed'. A common convention is for browsers to + look at the encoding on the page which contains the form and to translate + data into that encoding before posting. You declare an encoding by using + a 'charset' parameter on the Content-type declaration, either in the + header:</para> </formalpara> - <para>A good starting point is probably to include an XML declaration - which specifies iso-8859-1 encoding. By doing this, you are stating your - assumption that characters in the range 0x00-0x7F are ASCII and - characters in the range 0xA0-0xFF are Latin1. It's probably not safe to - stop there though.</para> + <programlisting><![CDATA[ +print CGI->header('text/html; charset=utf-8'); + ]]></programlisting> - <para>If the user submits characters in the range 0x80-0x9F they are - unlikely to be ISO Latin1. You can't just assume this won't happen - as it's remarkably common for users to prepare text in Microsoft - Word and copy/paste it into a web form. If they have the 'smart quotes' - option enabled, the text may contain WinLatin1 characters. The following - routine can be used to 'sanitise' the data by replacing 'smart' - characters with their common ASCII equivalents and discarding other - troublesome characters.</para> + <para>or in a meta tag in the document itself:</para> + + <programlisting><![CDATA[ +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + ]]></programlisting> + + <para>If you find you've received characters in the range 0x80-0x9F, they + are unlikely to be ISO Latin1. This commonly results from users + preparing text in Microsoft Word and copying/pasting it into a web form. + If they have the 'smart quotes' option enabled, the text may contain + WinLatin1 characters. The following routine can be used to 'sanitise' + the data by replacing 'smart' characters with their common ASCII + equivalents and discarding other troublesome characters.</para> <programlisting><![CDATA[ sub sanitise { @@ -1551,6 +1557,10 @@ } ]]></programlisting> + <para>Note: It might be safer to simply reject any input with characters + in the above range since it implies the browser ignored your charset + declaration and guessing the encoding is risky at best.</para> + </answer> </qandaentry> @@ -1571,9 +1581,9 @@ <para>These days, there are a number of alternatives to the DTD and the term validation has assumed a broader meaning than simply DTD conformance. The most visible alternative to the DTD is the W3C's own - XML Schema. <ulink - url="http://www.oasis-open.org/committees/relax-ng/">Relax NG</ulink> is - a popular alternative developed by OASIS.</para> + <ulink url="http://www.w3.org/TR/xmlschema-0/">XML Schema</ulink>. + <ulink url="http://www.oasis-open.org/committees/relax-ng/">Relax + NG</ulink> is a popular alternative developed by OASIS.</para> <para>If you design your own class of XML document, you are perfectly free to select the system for defining and validating document @@ -1689,7 +1699,21 @@ <para><classname>XML::Xerces</classname> provides a wrapper around the Apache project's Xerces parser library. Installing Xerces can be challenging and the documentation for the Perl API is not great, but it's - the only tool offering Schema validation from Perl.</para> + the most complete offering for Schema validation from Perl.</para> + + </answer> + </qandaentry> + + <qandaentry id="validation_xml_validator_schema"> + <question> + <para>W3C Schema Validation With <classname>XML::Validator::Schema</classname></para> + </question> + <answer> + + <para>Sam Tregar's <classname>XML::Validator::Schema</classname> allows + you to validate XML documents against a W3C XML Schema. It does not + implement the full W3C XML Schema recommendation, but a useful + subset.</para> </answer> </qandaentry> @@ -1947,7 +1971,7 @@ <title>Bad encoding declaration</title> <para>An incorrect or missing encoding declaration can cause this. By - default the encoding is assumed to be UTF8 so if your data is (say) + default the encoding is assumed to be UTF-8 so if your data is (say) ISO-8859-1 encoded then you must include an encoding declaration. For example:</para> @@ -1999,7 +2023,7 @@ <para>You can find the definitions for <ulink url="http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_module_XHTML_Latin_1_Character_Entities" - >HTML Latin 1 characters entities</ulink> on the W3C Site.</para> + >HTML Latin 1 character entities</ulink> on the W3C Site.</para> <para>You can include all these character entities into your DTD, so that you won't have to worry about it anymore:</para> |