From: Grant M. <gr...@us...> - 2002-04-19 20:38:51
|
Update of /cvsroot/perl-xml/perl-xml-faq In directory usw-pr-cvs1:/tmp/cvs-serv8157 Modified Files: perl-xml-faq.xml Log Message: - fixed CDATA typo - Added regex from Andreas Koenig - Added question on 'invalid character number' Index: perl-xml-faq.xml =================================================================== RCS file: /cvsroot/perl-xml/perl-xml-faq/perl-xml-faq.xml,v retrieving revision 1.4 retrieving revision 1.5 diff -u -d -r1.4 -r1.5 --- perl-xml-faq.xml 17 Apr 2002 20:45:49 -0000 1.4 +++ perl-xml-faq.xml 19 Apr 2002 20:38:47 -0000 1.5 @@ -1027,9 +1027,22 @@ </formalpara> <programlisting><![CDATA[ +use utf8; + s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse; ]]></programlisting> + <para>Andreas Koenig has supplied an alternative regular + expression:</para> + + <programlisting><![CDATA[ +s/([^\x20-\x7F])/'&#' . ord($1) . ';'/gse; + ]]></programlisting> + + <para>This version does not require 'use utf8'; does not require a + version of Perl which recognises \x{NN} and handles characters + outside the 0x80-0xFFFF range.</para> + <para>Even if you are outputting Latin1, you will need to use a technique like this for all characters beyond position 255 (eg: the Euro symbol) since there is no other way to represent them in Latin1.</para> @@ -1731,6 +1744,29 @@ </answer> </qandaentry> + <qandaentry id="invalid_char_num"> + <question> + <para>'reference to invalid character number'</para> + </question> + <answer> + + <para>The XML spec defines <ulink + url="http://www.w3.org/TR/1998/REC-xml-19980210.html#NT-Char">legal + characters</ulink> as tab (0x09), carriage return (0x0D), line feed + (0x0A) and the legal graphic characters of Unicode. This specifically + excludes control characters, so this would not be well-formed:</para> + + <programlisting><![CDATA[ +<char></char> + ]]></programlisting> + + <para>Their really is no easy or standard way to include control + characters in XML - binary data must be encoded (for example using + <classname>MIME::Base64</classname>).</para> + + </answer> + </qandaentry> + <qandaentry id="using_cdata"> <question> <para>Embedding Arbitrary Text in XML</para> @@ -1745,11 +1781,11 @@ example, this XML document ...</para> <programlisting><![CDATA[ -<code><![CDATA[ +<code><![CDATA[ if($qty < 1) { print "<p>Invalid quantity!</p>"; } -]]></code> +]]>]]><![CDATA[</code> ]]></programlisting> <para>is equivalent to this document ...</para> |