Update of /cvsroot/perl-xml/perl-xml-faq
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18818
Modified Files:
perl-xml-faq.xml
Log Message:
- added Q&A re 'use utf8;' in 5.8
Index: perl-xml-faq.xml
===================================================================
RCS file: /cvsroot/perl-xml/perl-xml-faq/perl-xml-faq.xml,v
retrieving revision 1.18
retrieving revision 1.19
diff -u -d -r1.18 -r1.19
--- perl-xml-faq.xml 11 Nov 2004 08:57:33 -0000 1.18
+++ perl-xml-faq.xml 11 Nov 2004 09:25:15 -0000 1.19
@@ -1401,7 +1401,7 @@
</formalpara>
<programlisting><![CDATA[
-use utf8;
+use utf8; # Only needed for 5.6, not 5.8 or later
s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;
]]></programlisting>
@@ -1413,8 +1413,8 @@
s/([^\x20-\x7F])/'&#' . ord($1) . ';'/gse;
]]></programlisting>
- <para>This version does not require 'use utf8'; does not require a
- version of Perl which recognises \x{NN} and handles characters
+ <para>This version does not require 'use utf8' with Perl 5.6; does not
+ require a version of Perl which recognises \x{NN} and handles characters
outside the 0x80-0xFFFF range.</para>
<para>Even if you are outputting Latin1, you will need to use a technique
@@ -1495,7 +1495,7 @@
</formalpara>
<programlisting><![CDATA[
-use utf8;
+use utf8; # Not required with 5.8 or later
my $u_city = "S\x{E3}o Paulo";
my $l_city = pack("C*", unpack('U*', $u_city));
@@ -1581,6 +1581,33 @@
</qandaentry>
+ <qandaentry id="use_utf8">
+ <question>
+ <para>What does 'use utf8;' do?</para>
+ </question>
+
+ <answer>
+
+ <para>In Perl 5.8 and later, the sole use of the 'use utf8;' pragma is to
+ tell Perl that your script is written in UTF-8 (ie: any non-ASCII or
+ multibyte characters should be interpreted as UTF-8). So if your code is
+ plain ASCII, you don't need the pragma.</para>
+
+ <para>The original UTF8 support in Perl 5.6 required the pragma to
+ enable wide character support for builtin functions (such as length)
+ and the regular expression engine. This is no longer necessary in 5.8
+ since Perl automatically uses character rather than byte semantics
+ with strings that have the utf8 flag set.</para>
+
+ <para>You can find out more about how Unicode handling changed in
+ Perl 5.8 from the <ulink
+ url="http://search.cpan.org/dist/perl/pod/perl58delta.pod">perl58delta.pod</ulink>
+ file that ships with Perl.</para>
+
+ </answer>
+
+ </qandaentry>
+
<qandaentry id="encoding_common">
<question>
<para>What are some commonly encountered problems with encodings?</para>
|