Update of /cvsroot/perl-xml/perl-xml-faq
In directory usw-pr-cvs1:/tmp/cvs-serv8157
Modified Files:
perl-xml-faq.xml
Log Message:
- fixed CDATA typo
- Added regex from Andreas Koenig
- Added question on 'invalid character number'
Index: perl-xml-faq.xml
===================================================================
RCS file: /cvsroot/perl-xml/perl-xml-faq/perl-xml-faq.xml,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -d -r1.4 -r1.5
--- perl-xml-faq.xml 17 Apr 2002 20:45:49 -0000 1.4
+++ perl-xml-faq.xml 19 Apr 2002 20:38:47 -0000 1.5
@@ -1027,9 +1027,22 @@
</formalpara>
<programlisting><![CDATA[
+use utf8;
+
s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;
]]></programlisting>
+ <para>Andreas Koenig has supplied an alternative regular
+ expression:</para>
+
+ <programlisting><![CDATA[
+s/([^\x20-\x7F])/'&#' . ord($1) . ';'/gse;
+ ]]></programlisting>
+
+ <para>This version does not require 'use utf8'; does not require a
+ version of Perl which recognises \x{NN} and handles characters
+ outside the 0x80-0xFFFF range.</para>
+
<para>Even if you are outputting Latin1, you will need to use a technique
like this for all characters beyond position 255 (eg: the Euro symbol)
since there is no other way to represent them in Latin1.</para>
@@ -1731,6 +1744,29 @@
</answer>
</qandaentry>
+ <qandaentry id="invalid_char_num">
+ <question>
+ <para>'reference to invalid character number'</para>
+ </question>
+ <answer>
+
+ <para>The XML spec defines <ulink
+ url="http://www.w3.org/TR/1998/REC-xml-19980210.html#NT-Char">legal
+ characters</ulink> as tab (0x09), carriage return (0x0D), line feed
+ (0x0A) and the legal graphic characters of Unicode. This specifically
+ excludes control characters, so this would not be well-formed:</para>
+
+ <programlisting><![CDATA[
+<char></char>
+ ]]></programlisting>
+
+ <para>Their really is no easy or standard way to include control
+ characters in XML - binary data must be encoded (for example using
+ <classname>MIME::Base64</classname>).</para>
+
+ </answer>
+ </qandaentry>
+
<qandaentry id="using_cdata">
<question>
<para>Embedding Arbitrary Text in XML</para>
@@ -1745,11 +1781,11 @@
example, this XML document ...</para>
<programlisting><![CDATA[
-<code><![CDATA[
+<code><![CDATA[
if($qty < 1) {
print "<p>Invalid quantity!</p>";
}
-]]></code>
+]]>]]><![CDATA[</code>
]]></programlisting>
<para>is equivalent to this document ...</para>
|