From: Grant M. <gr...@us...> - 2008-03-18 09:11:40
|
Update of /cvsroot/perl-xml/perl-xml-faq In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26347 Modified Files: perl-xml-faq.xml Log Message: - add 'quick answer - XML::LibXML' and freshen up various other answers Index: perl-xml-faq.xml =================================================================== RCS file: /cvsroot/perl-xml/perl-xml-faq/perl-xml-faq.xml,v retrieving revision 1.26 retrieving revision 1.27 diff -u -d -r1.26 -r1.27 --- perl-xml-faq.xml 14 Nov 2006 08:39:28 -0000 1.26 +++ perl-xml-faq.xml 18 Mar 2008 09:11:37 -0000 1.27 @@ -15,11 +15,7 @@ </para></authorblurb> </author> <copyright> - <year>2002</year> - <year>2003</year> - <year>2004</year> - <year>2005</year> - <year>2006</year> + <year>2002 - 2008</year> <holder>Grant McLean</holder> </copyright> @@ -224,6 +220,22 @@ </answer> </qandaentry> + <qandaentry id="quick_choice"> + <question> + <para>The Quick Answer</para> + </question> + <answer> + + <para>For general purpose XML processing with Perl, <classname>XML::LibXML</classname> + is usually the best choice. It is stable, fast and powerful. To make the most of the + module you need to learn and use XPath expressions. The documentation for XML::LibXML is + its biggest weakness.</para> + + <para>Other modules may be better suited to particular niches - as discussed below.</para> + + </answer> + </qandaentry> + <qandaentry id="tree_vs_stream"> <question> <para>Tree versus stream parsers</para> @@ -347,14 +359,13 @@ <para>If your needs are simple, try <classname>XML::Simple</classname>. It's loosely classified as a tree based parser although the 'tree' is really just nested Perl hashes and arrays. You may need to swot up on - Perl references (<command>perldoc perlreftut</command>) to take advantage - of this module.</para> + Perl references (see: <command>perldoc perlreftut</command>) to take + advantage of this module.</para> <para>If you're looking for a more powerful tree based approach, try - <classname>XML::XPath</classname>. This module offers a DOM style API - with the added bonus of XPath support. If speed is critical, you'll - find that <classname>XML::LibXML</classname> is much faster but a bit - more 'bleeding edge'.</para> + <classname>XML::LibXML</classname> for a standards compliant DOM or + <classname>XML::Twig</classname> for a more 'Perl-like' API. Both of + these modules support XPath.</para> <para>If you've decided to use a stream based approach, head directly for SAX. The <classname>XML::SAX</classname> distribution @@ -365,12 +376,6 @@ C-based parser library ('expat' by James Clark) as <classname>XML::Parser</classname>, for faster parsing.</para> - <para>Another option worthy of investigation is - <classname>XML::Twig</classname>. This hybrid module combines the - convenience of a tree approach with the lower memory demands of the - stream style. You configure the parser and it gives you the document in - chunks (bits of the tree or 'twigs').</para> - <para>Finally, the latest trendy buzzword in Java and C# circles is 'pull' parsing (see <ulink url="http://www.xmlpull.org/" >www.xmlpull.org</ulink>). Unlike SAX, which 'pushes' events at your @@ -481,26 +486,29 @@ <para><classname>XML::LibXML</classname> provides a Perl wrapper around the GNOME Project's libxml2 library. This module was originally written - by Matt Sergeant and is now actively maintained by Christian Glahn. It - is very fast, complete and stable. It can run in validating or - non-validating modes and offers a DOM with XPath support. The DOM and - associated memory management is implemented in C which offers significant - performance advantages over DOM trees built from Perl datatypes. The - <classname>XML::LibXML::SAX::Builder</classname> module allows a libxml2 - DOM to be constructed from SAX events. + by Matt Sergeant and Christian Glahn and is now actively maintained by + Petr Pajas. It is very fast, complete and stable. It can run in + validating or non-validating modes and offers a DOM with XPath support. + The DOM and associated memory management is implemented in C which offers + significant performance advantages over DOM trees built from Perl + datatypes. The <classname>XML::LibXML::SAX::Builder</classname> module + allows a libxml2 DOM to be constructed from SAX events. <classname>XML::LibXML::SAX</classname> is a SAX parser based on the libxml2 library.</para> - <para><classname>XML::LibXML</classname> can be used to parse HTML (4.0 - strict) and SGML files into DOM structures - which is especially useful - when converting other formats to XML.</para> + <para><classname>XML::LibXML</classname> can also be used to parse HTML + files into DOM structures - which is especially useful when converting + other formats to XML or using XPath to 'scrape' data from web + pages.</para> <para>The libxml2 library is not part of the - <classname>XML::LibXML</classname> distribution. The source is available - for download from <ulink url="http://xmlsoft.org">xmlsoft.org</ulink>; - it is a standard package in most Linux distributions; it can be compiled - on numerous other platforms; and it is bundled with PPM packages of - <classname>XML::LibXML</classname> for Windows.</para> + <classname>XML::LibXML</classname> distribution. Precompiled + distributions of the libxml2 library and the + <classname>XML::LibXML</classname> Perl wrapper are available for most + operating systems. The library is a standard package in most Linux + distributions; it can be compiled on numerous other platforms; and it is + bundled with PPM packages of <classname>XML::LibXML</classname> for + Windows.</para> <para>For early access to upcoming features such as W3C Schema and RelaxNG validation, you can access the CVS version of <classname>XML::LibXML</classname> at:</para> @@ -518,13 +526,10 @@ </question> <answer> - <para>Matt Sergeant's <classname>XML::XPath</classname> module provides a - DOM implementation (in Perl) which supports XPath queries. It can't - rival <classname>XML::LibXML::SAX</classname> for speed but it may be - easier to install - especially if you don't have a compiler. Parsing XML - documents is performed by the expat library via - <classname>XML::Parser</classname>. You can serialise the DOM to SAX - events.</para> + <para>Matt Sergeant's <classname>XML::XPath</classname> module was the first + Perl DOM implementation to support XPath. It has largely been supplanted by + <classname>XML::LibXML</classname> which is better maintained and more + powerful.</para> </answer> </qandaentry> @@ -566,6 +571,16 @@ module also supports building the tree from SAX events or using a simple Perl data structure to drive a SAX pipeline.</para> + <para>If you are using <classname>XML::Simple</classname>, you should + read "<ulink url="http://www.perlmonks.org/index.pl?node_id=218480">Does + your XML::Simple code pass the strict test?</ulink>" for a discussion of + common pitfalls and ways to avoid them.</para> + + <para>If you are becoming frustrated by the limitations of + <classname>XML::Simple</classname>, see: "<ulink + url="http://www.perlmonks.org/index.pl?node_id=490846">Stepping up from + XML::Simple to XML::LibXML</ulink>".</para> + </answer> </qandaentry> @@ -588,8 +603,8 @@ <para>Another advantage of <classname>XML::Twig</classname> is that it is not constrained by the tyranny of DOM compliance. Instead, it offers a number of conveniences to help the experienced Perl programmer feel right - at home. The official home page for <classname>XML::Twig</classname> is - <ulink + at home. <classname>XML::Twig</classname> also supports XPath + expressions. The module's official home page for is <ulink url="http://www.xmltwig.com/">http://www.xmltwig.com/</ulink>.</para> </answer> @@ -643,13 +658,13 @@ </question> <answer> - <para>Matt Sergeant's <classname>XML::PYX</classname> comes with - some wrapper scripts for working with XML files using command line - pipelines. The PYX notation allows you to apply commands like - <command>grep</command> and <command>sed</command> to specific parts of - the XML document (eg: element names, attribute values, text content). - For example, this one-liner provides a report of how many times each - type of element is used in a document:</para> + <para>Although written in Perl, Matt Sergeant's + <classname>XML::PYX</classname> is really designed for working with XML + files using shell command pipelines. The PYX notation allows you to + apply commands like <command>grep</command> and <command>sed</command> to + specific parts of the XML document (eg: element names, attribute values, + text content). For example, this one-liner provides a report of how many + times each type of element is used in a document:</para> <programlisting><![CDATA[ pyx doc.xml | sed -n 's/^(//p' | sort | uniq -c |