[Htmlparser-cvs] htmlparser/docs/wiki/index.php Benchmarks,NONE,1.1 BlockFeedback,NONE,1.1 Collectin
Brought to you by:
derrickoswald
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/index.php In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki/index.php Added Files: Benchmarks BlockFeedback CollectingParameter CompositePattern CustomTagExtraction CustomTagLinks CustomVisitorLinks EmailExtraction EnableFeedback ExternalIterators FactoryMethod FeedbackMechanism FilterLinks FrequentlyAskedQuestions HomePage ImageExtraction InternalIterators IteratorPattern JavaBeans LexerLinks LinkBeanLinks LinkExtraction ParserDesign PatternStories PostOperation RSSFeeds ReverseHtml SamplePrograms SearchingForData SomikRaha StrategyPattern StringExtraction TemplateMethod TestDrivenDevelopment UsingCookiesWithParser VisitorLinks VisitorPattern WebCrawler WebRipper WritingYourOwnScanners Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: CustomTagLinks --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Custom Tag Links, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: CustomTagLinks" href="CustomTagLinks?action=viewsource&version=3" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Custom Tag Links</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Using Custom Tags to Extract Links</b></p> <p>The use of custom tags provides for altered behaviour during the parse:</p> <pre> import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.LinkTag; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.ParserException; class MyLinkTag extends LinkTag { public void doSemanticAction () throws ParserException { System.out.print ("\"" + getLinkText () + "\" => "); System.out.println (getLink ()); } } public class LinkDemo { public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://urlIWantToParse.com"); PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.registerTag (new MyLinkTag ()); parser.setNodeFactory (factory); for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) e.nextNode (); // just parsing the nodes executes doSemanticAction } }</pre> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.332 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: ReverseHtml --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Reverse Html, PhpWiki" /> <meta name="description" content="Often, it might be desired to modify the html being reconstructed. In such a case, you must change the tag's attributes prior to calling toHtml(). For example, if the tag in question is a link tag, and you wish to modify the href, do this:" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: ReverseHtml" href="ReverseHtml?action=viewsource&version=7" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Reverse Html</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Reverse Html Rendering</b></p> <p>In order to get back the html representation of a web page, you may use toHtml() recursively. Here's one way to get it:</p> <pre> import org.htmlparser.Parser; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.ParserException; public class ToHtmlDemo { public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://urlIWantToParse.com"); StringBuffer html = new StringBuffer (4096); for (NodeIterator i = parser.elements();i.hasMoreNodes();) html.append (i.nextNode().toHtml ()); System.out.println (html); } }</pre> <p>Often, it might be desired to modify the html being reconstructed. In such a case, you must change the tag's attributes prior to calling toHtml(). For example, if the tag in question is a link tag, and you wish to modify the href, do this:</p> <pre> linkTag.setLink ("http://newUrlString"); linkTag.toHtml ();</pre> <p>This is equivalent to:</p> <pre> linkTag.setAttribute ("href", "http://newUrlString"); linkTag.toHtml ();</pre> <p>This latter would work on any tag, but few other tags have an HREF attribute according to the <a href="http://www.w3.org/TR/html4/" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />HTML</span> specification</a>. The <i>toHtml()</i> method applies to all nodes, not just tags. For tags it is basically a reconstruction of the tag using its attributes (at the atomic level) and its children (at the macro/composite level).</p> <p>You can also change the name of the tag like so:</p> <pre> tag.setTagName (newTagName);</pre> <p>and there are numerous ways to add, remove or change the attributes of a tag. For example, to add or change the ID attribute to "EditArea" use:</p> <pre> tag.setAttribute ("id", "EditArea", '"');</pre> <p>Whole tags can be added and removed from the list of children held by each tag. For example, to add a <P> tag at the same level as another tag:</p> <pre> newTag = new Tag (); newTag.setTagName ("P"); tag.getParent ().getChildren ().add (newTag);</pre> <p>Be careful, getChildren () may return null for an arbitrary tag.</p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.421 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: PatternStories --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Pattern Stories, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: PatternStories" href="PatternStories?action=viewsource&version=3" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Pattern Stories</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Pattern Stories</b></p> <p>The parser uses the following patterns:</p> <ul> <li><a href="../index.php/FactoryMethod" class="wiki">FactoryMethod</a></li> <li><a href="../index.php/TemplateMethod" class="wiki">TemplateMethod</a></li> <li><a href="../index.php/IteratorPattern" class="wiki">IteratorPattern</a></li> <li><a href="../index.php/VisitorPattern" class="wiki">VisitorPattern</a></li> <li><a href="../index.php/CollectingParameter" class="wiki">CollectingParameter</a></li> <li><a href="../index.php/StrategyPattern" class="wiki">StrategyPattern</a></li> <li><a href="../index.php/CompositePattern" class="wiki">CompositePattern</a></li> </ul> <p>--<a href="../index.php/SomikRaha" class="wiki">SomikRaha</a></p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.267 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: StringExtraction --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="String Extraction, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: StringExtraction" href="StringExtraction?action=viewsource&version=9" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - String Extraction</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>String Extraction</b></p> <p>To get all the text content from a web page, use the TextExtractingVisitor, like so:</p> <pre> import org.htmlparser.Parser; import org.htmlparser.util.ParserException; import org.htmlparser.visitors.TextExtractingVisitor; public class StringDemo { public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://pageIwantToParse.com"); TextExtractingVisitor visitor = new TextExtractingVisitor (); parser.visitAllNodesWith (visitor); System.out.println (visitor.getExtractedText()); } }</pre> <p>If you want a more browser like behaviour, use the StringBean like so:</p> <pre> import org.htmlparser.beans.StringBean; public class StringDemo { public static void main (String[] args) { StringBean sb = new StringBean (); sb.setLinks (false); sb.setReplaceNonBreakingSpaces (true); sb.setCollapse (true); sb.setURL ("http://pageIwantToParse.com"); System.out.println (sb.getStrings ()); } }</pre> <p><b>thank you</b></p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.286 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: LinkExtraction --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Link Extraction, PhpWiki" /> <meta name="description" content="Is there a preffered method ? Seems to be too many ways." /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: LinkExtraction" href="LinkExtraction?action=viewsource&version=10" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Link Extraction</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Link Extraction</b></p> <p>There are many ways of extracting links.</p> <ul> <li><a href="../index.php/VisitorLinks" class="named-wiki" title="VisitorLinks">Use an ObjectFindingVisitor</a></li> <li><a href="../index.php/CustomVisitorLinks" class="named-wiki" title="CustomVisitorLinks">Use a custom Visitor</a></li> <li><a href="../index.php/LinkBeanLinks" class="named-wiki" title="LinkBeanLinks">Use a LinkBean</a></li> <li><a href="../index.php/CustomTagLinks" class="named-wiki" title="CustomTagLinks">Use a custom Tag</a></li> <li><a href="../index.php/FilterLinks" class="named-wiki" title="FilterLinks">Use a NodeFilter</a></li> <li><a href="../index.php/LexerLinks" class="named-wiki" title="LexerLinks">Use a low level Lexer</a></li> </ul> <p>Is there a preffered method ? Seems to be too many ways.</p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.426 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: ExternalIterators --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="External Iterators, PhpWiki" /> <meta name="description" content="You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing." /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: ExternalIterators" href="ExternalIterators?action=viewsource&version=2" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - External Iterators</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>External Iterators</b></p> <p>You can use external iterators to drive the entire parsing process like so :</p> <pre> for (NodeIterator i = parser.elements();i.hasMoreNodes();) { Node node = e.nextNode(); if (node instanceof LinkTag) { } if (node instanceof ImageTag) { } }</pre> <p>You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing.</p> <p>--<a href="../index.php/SomikRaha" class="wiki">SomikRaha</a></p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.225 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: PostOperation --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Post Operation, PhpWiki" /> <meta name="description" content="The standard HTTP request submitted by the parser is a GET. This note describes how to use POST, which is the usual request submitted by a form." /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: PostOperation" href="PostOperation?action=viewsource&version=13" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Post Operation</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><h4>POST Operation</h4> <p>The standard HTTP request submitted by the parser is a GET. This note describes how to use POST, which is the usual request submitted by a form.</p> <p>As an example, we'll submit a form to the U.S. postal service web site.<br /> <i>Note: This is suboptimal, the postal service provides tools for this type of thing: <a href="http://www.uspswebtools.com" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />http://www.uspswebtools.com</span></a></i><br /></p> <p>On the USPS web site, the page <a href="http://www.usps.com/zip4/citytown.htm" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />http://www.usps.com/zip4/citytown.htm</span></a> has the following FORM that asks for a zip code and returns the cities or towns covered by the zip code (only form elements are shown removing all the formatting markup):</p> <pre> <form NAME="frmzip" ACTION="zip_response.jsp" METHOD="post" OnSubmit="return validate(frmzip)"> <input type="text" id="zipcode" name="zipcode" size="5" maxlength="5" TABINDEX="10"> <input TYPE="image" NAME="Submit" SRC="/zip4/images/submit.jpg" BORDER="0" WIDTH="50" HEIGHT="17" ALT="Submit" TABINDEX="11"></pre> <p>From this we determine that the <tt>METHOD</tt> is <tt>POST</tt> and the form should be submitted to <tt>zip_response.jsp</tt>. This relative URL is relative to the page it is found on, so the form should be submitted to <tt>http://www.usps.com/zip4/zip_response.jsp</tt> when the <tt>Submit</tt> input is clicked. The only <tt>input</tt> element other than the <tt>Submit</tt> is a single <tt>text</tt> field that takes 5 or fewer characters. Other types of input element are described in <a href="http://www.w3.org/TR/html4/interact/forms.html" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />http://www.w3.org/TR/html4/interact/forms.html</span></a>.</p> <p>The basic operation is to pass a fully prepared <tt>HttpURLConnection</tt> connected to the <tt>POST</tt> target URL into the <tt>Parser</tt>, either in the constructor or via the <tt>setConnection()</tt> method. To condition the connection, use the <tt>setRequestMethod()</tt> method to set the <tt>POST</tt> operation, and the <tt>setRequestProperty()</tt> and other explicit method calls. Then write the input fields as an ampersand concatenation (<tt>"input1=value1&input2=value2&..."</tt>) into the <tt>PrintWriter</tt> obtained by a call to <tt>getOutputStream()</tt>.</p> <p>The following sample program illustrates the principles using a <tt>StringBean</tt>, but the same code could be used with a <tt>Parser</tt> by replacing the last three lines in the <tt>try</tt> block with:</p> <pre> parser = new Parser (); parser.setConnection (connection); // ... do parser operations</pre> <p><a href="http://htmlparser.sourceforge.net/images/Zip.java" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />Source</span> Code.</a> <a href="http://htmlparser.sourceforge.net/images/Zip.html" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />Pretty</span> Print Source Code</a></p> <pre> /* * Zip.java * POST zip code to look up cities. * * Created on April 20, 2003, 11:09 PM */ import java.io.PrintWriter; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLConnection; import org.htmlparser.beans.StringBean; /** * POST zip code to look up cities. * @author Derrick Oswald */ public class Zip { String mText; // text extracted from the response to the POST request /** * Creates a new instance of Zip */ public Zip (String code) { URL url; HttpURLConnection connection; StringBuffer buffer; PrintWriter out; StringBean bean; try { // from the 'action' (relative to the refering page) url = new URL ("http://www.usps.com/zip4/zip_response.jsp"); connection = (HttpURLConnection)url.openConnection (); connection.setRequestMethod ("POST"); connection.setDoOutput (true); connection.setDoInput (true); connection.setUseCaches (false); // more or less of these may be required // see Request Header Definitions: http://www.ietf.org/rfc/rfc2616.txt connection.setRequestProperty ("Accept-Charset", "*"); connection.setRequestProperty ("Referer", "http://www.usps.com/zip4/citytown.htm"); connection.setRequestProperty ("User-Agent", "Zip.java/1.0"); buffer = new StringBuffer (1024); // 'input' fields separated by ampersands (&) buffer.append ("zipcode="); buffer.append (code); // buffer.append ("&"); // etc. out = new PrintWriter (connection.getOutputStream ()); out.print (buffer); out.close (); bean = new StringBean (); bean.setConnection (connection); mText = bean.getStrings (); } catch (Exception e) { mText = e.getMessage (); } } public String getText () { return (mText); } /** * Program mainline. * @param args The zip code to look up. */ public static void main (String[] args) { if (0 >= args.length) System.out.println ("Usage: java Zip <zipcode>"); else System.out.println (new Zip (args[0]).getText ()); } }</pre> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.342 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: SearchingForData --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Searching For Data, PhpWiki" /> <meta name="description" content="Searching for data is one of the most challenging tasks in a web page due to its seemingly unstructured (or badly structured) form. Complex searches are now possible with the parser in a simple to use API. Here's an example :" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: SearchingForData" href="SearchingForData?action=viewsource&version=4" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Searching For Data</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p>Searching for data is one of the most challenging tasks in a web page due to its seemingly unstructured (or badly structured) form. Complex searches are now possible with the parser in a simple to use API. Here's an example :</p> <p>We are looking at a page which has the following html:</p> <pre> <html> ... <body> <table> <tr> <td><font size="-1">Name:<b><i>John Doe</i></b></font></td> .. </tr> <tr> .. </tr> </table> </body> </html></pre> <p>We'd like to extract the information corresponding to the field "Name". This is possible if we make use of the fact that the name appears two tags after "Name".</p> <p>Code to achieve this would look like:</p> <pre> Node nodes [] = parser.extractAllNodesThatAre(TableTag.class); // Get the first table found TableTag table = (TableTag)nodes[0]; // Find the position of Name. StringNode [] stringNodes = table.digupStringNode("Name"); StringNode name = stringNodes[0]; // We assume that the first node that matched is the one we want. We // navigate to its parent, the column tag <td> CompositeTag td = name.getParent(); // From the parent, we shall find out the position of "Name" int posOfName = td.findPositionOf(name); // Its easy now to navigate to John Doe, as we know it is 3 positions away Node expectedName = td.childAt(posOfName + 3); </pre> <hr /><p>You can move up the parent tree - e.g. when the data is in seperate columns,</p> <pre> <html> ... <body> <table> <tr> <td><font size="-1">Name:</font></td> <td><font size="-1">John Doe</font></td> </tr> <tr> .. </tr> </table> </body> </html></pre> <p>We'd like to perform the same search on "Name".</p> <p>Code to achieve this would look like:</p> <pre> Node nodes [] = parser.extractAllNodesThatAre(TableTag.class); // Get the first table found TableTag table = (TableTag)nodes[0]; // Find the position of Name. StringNode [] stringNodes = table.digupStringNode("Name"); // We assume that the first node that matched is the one we want. We // navigate to its parent (column <td>) CompositeTag td = stringNodes[0].getParent(); // Navigate to its parent (row <tr>) CompositeTag tr = parentOfName.getParent(); // From the parent, we shall find out the position of the column int columnNo = tr.findPositionOf(td); // Its easy now to navigate to John Doe, as we know it is in the next column TableColumn nextColumn = (TableColumn)tr.childAt(columnNo+1); // The name is the second item in the column tag Node expectedName = nextColumn.childAt(1);</pre> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.257 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: RSSFeeds --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: RSSFeeds,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="RSSFeeds, PhpWiki" /> <meta name="description" content="Project name: HTML Parser Project description: HTML Parser is a library, written in Java, which allows you to parse HTML (HTML 4.0 supported). It has been used by people on live projects. Developers appreciate how easy it is to use. The architecture is flexible, allowing you to extend it easily. Developers on project: 16 Project administrators: &#60;a href=&#34;http://sourceforge.net/users/derrickoswald/&#34;&#62;derrickoswald&#60;/a&#62;, &#60;a href=&#34;http://sourceforge.net/users/somik/&#34;&#62;somik&#60;/a&#62; Activity percentile (last week): 98.3413% Most recent daily statistics (24 Jan 2004): Ranking: 251, Activity percentile: 98.34%, Downloadable files: 25615 total downloads to date Mos... [truncated message content] |