From: Grant M. <gr...@us...> - 2006-11-06 08:26:39
|
Update of /cvsroot/perl-xml/perl-xml-faq In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv16337 Modified Files: perl-xml-faq.xml Log Message: - add Q&A re namespaces and XPath Index: perl-xml-faq.xml =================================================================== RCS file: /cvsroot/perl-xml/perl-xml-faq/perl-xml-faq.xml,v retrieving revision 1.23 retrieving revision 1.24 diff -u -d -r1.23 -r1.24 --- perl-xml-faq.xml 23 Oct 2006 02:54:34 -0000 1.23 +++ perl-xml-faq.xml 6 Nov 2006 08:26:32 -0000 1.24 @@ -19,6 +19,7 @@ <year>2003</year> <year>2004</year> <year>2005</year> + <year>2006</year> <holder>Grant McLean</holder> </copyright> @@ -2332,6 +2333,103 @@ </answer> </qandaentry> + <qandaentry id="namespaces_xpath"> + <question> + <para>Using XPath with Namespaces</para> + </question> + <answer> + + <para>People often experience difficulty getting their XPath expressions + to match when they first use <classname>XML::LibXML</classname> to + process an XML document containing namespaces. For example, consider + this XHTML document with an embedded SVG section:</para> + + <programlisting><![CDATA[ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml"> +<head> + <title>Sample Document</title> +</head> +<body> + +<h1>An HTML Heading</h1> + +<s:svg xmlns:s="http://www.w3.org/2000/svg" width="300" height="200"> + <s:rect style="fill: #eeeeee; stroke: #000000; stroke-width: 1;" + width="80" height="30" x="60" y="50" /> + <s:text style="font-size: 12px; fill: #000066; font-family: sans-serif;" + x="70" y="70">Label One</s:text> +</s:svg> + +</body> +</html> + ]]></programlisting> + + <para>The elements in the SVG section each use the namespace prefix + 's' which is bound to the URI 'http://www.w3.org/2000/svg'. The + prefix 's' is completely arbitrary and is merely a mechanism for + associating the URI with the elements. As a programmer, you will + perform matches against namespace URIs not prefixes.</para> + + <para>The elements in the XHTML wrapper do not have namespace prefixes, + but are bound to the URI 'http://www.w3.org/1999/xhtml' by way of the + default namespace declaration on the opening <html> tag.</para> + + <para>You might expect that you could match all the 'h1' elements using + this XPath expression ...</para> + + <programlisting><![CDATA[ +//h1 + ]]></programlisting> + + <para>... however, that won't work since the namespace URI is effectively + part of the name of the element you're trying to match.</para> + + <para>One approach would be to fashion an XPath query which ignored the + namespace portion of element names and matched only on the 'local name' + portion. For example:</para> + + <programlisting><![CDATA[ +//*[local-name() = 'h1'] + ]]></programlisting> + + <para>A better approach is to match the namespace portion as well. To + achieve that, the first step is to use + <classname>XML::LibXML::XPathContext</classname> to declare a namespace + prefix. Then, the prefix can be used in the XPath expression:</para> + + <programlisting><![CDATA[ +my $parser = XML::LibXML->new(); +my $doc = $parser->parse_file('sample.xhtml'); + +my $xpc = XML::LibXML::XPathContext->new($doc); +$xpc->registerNs(xhtml => 'http://www.w3.org/1999/xhtml'); + +foreach my $node ($xpc->findnodes('//xhtml:h1')) { + print $node->to_literal, "\n"; +} + ]]></programlisting> + + <para>The same technique can be used to match 'text' elements in the + SVG section:</para> + + <programlisting><![CDATA[ +$xpc->registerNs(svg => 'http://www.w3.org/2000/svg'); +foreach my $node ($xpc->findnodes('//svg:text')) { + print $node->to_literal, "\n"; +} + ]]></programlisting> + + <note><para>The <classname>XML::LibXML::XPathContext</classname> module + has been included in the <classname>XML::LibXML</classname> distribution + since version 1.61. Prior to that it was in its own separate + distribution on CPAN.</para></note> + + </answer> + </qandaentry> + </qandadiv> <qandadiv id="misc"><title>Miscellaneous</title> |