[Htmlparser-cvs] htmlparser/docs/docs Benchmarks.html,NONE,1.1 BlockFeedback.html,1.3,1.4 Collecting
Brought to you by:
derrickoswald
Update of /cvsroot/htmlparser/htmlparser/docs/docs In directory sc8-pr-cvs1:/tmp/cvs-serv24811/htmlparser/docs/docs Modified Files: BlockFeedback.html CollectingParameter.html CompositePattern.html CustomTagExtraction.html EmailExtraction.html EnableFeedback.html ExternalIterators.html FactoryMethod.html FeedbackMechanism.html FirstName.html FrequentlyAskedQuestions.html FullName.html ImageExtraction.html InternalIterators.html IteratorPattern.html JavaBeans.html LastName.html LinkExtraction.html ParserDesign.html ParsingXml.html PatternStories.html PostOperation.html ReverseHtml.html SamplePrograms.html SearchingForData.html SomikRaha.html StrategyPattern.html StringExtraction.html TagFindingVisitor.html TagScanner.html TemplateMethod.html TestDrivenDevelopment.html TextExtractingVisitor.html UnitTestingPdf.html UnitTestingXsl.html UsingCookiesWithParser.html VisitorPattern.html WebCrawler.html WebRipper.html WritingYourOwnScanners.html index.html Added Files: Benchmarks.html Log Message: Update version headers to 1.4-20031026 and update changelog. --- NEW FILE: Benchmarks.html --- <html><head><title>Benchmarks</title></head><body> <div class="wikitext"> <p>Peter Lin, who works on the <a href="http://jakarta.apache.org/jmeter/index.html" class="namedurl"><span style="white-space: nowrap">JMeter</span></a> project has performed some benchmarks that indicate htmlparser is 40% to 600% faster than JTidy: <pre> 10 20 30 40 50 100 500 Yahoo Cnet Htmlparser 80.0 126.4 160.4 200.4 236.4 400.6 1630.2 474.4 1251.8 Tidy 498.6 531.0 626.8 658.8 687.0 849.4 2319.4 965.2 2049.0 Delta 6.23 4.2 3.91 3.29 2.91 2.12 1.42 2.03 1.64 <p>Full details are available in a <a href="http://htmlparser.sourceforge.net/benchmarks.zip" class="namedurl"><span style="white-space: nowrap">zip</span> file</a>. <div id="actionbar" class="toolbar"> <hr class="printer" noshade="noshade" /> <p class="editdate">Last edited on Wednesday, October 1, 2003 6:54:10 am. <hr class="toolbar" noshade="noshade" /> </body></html> Index: BlockFeedback.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/BlockFeedback.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** BlockFeedback.html 24 Aug 2003 18:44:10 -0000 1.3 --- BlockFeedback.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>Block Feedback</title></head><body><DIV class="wikitext"> ! <P><B>Block Feedback</B></P> ! <P>The parser sends warning and error messages to standard output by default. You might want to block that. To achieve this, use a different feedback object, like so:</P> ! <PRE> Parser parser = new Parser( "http://...", --- 1,13 ---- ! <html><head><title>Block Feedback</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Block Feedback ! ! <p>The parser sends warning and error messages to standard output by default. You might want to block that. To achieve this, use a different feedback object, like so: ! ! <pre> ! Parser parser = new Parser( "http://...", *************** *** 8,12 **** DefaultParserFeedback.QUIET ) ! );</PRE> ! <P>You can also switch the feedback to DEBUG mode, to get extra details. Check <A class="wiki" HREF="EnableFeedback.html">EnableFeedback</A>.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:40:45 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 15,32 ---- DefaultParserFeedback.QUIET ) ! ); ! ! <p>You can also switch the feedback to DEBUG mode, to get extra details. Check <a HREF=EnableFeedback.html class="wiki">EnableFeedback</a>. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 23, 2003 5:40:45 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: CollectingParameter.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/CollectingParameter.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** CollectingParameter.html 24 Aug 2003 18:44:10 -0000 1.3 --- CollectingParameter.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,8 **** ! <html><head><title>Collecting Parameter</title></head><body><DIV class="wikitext"> ! <P><B>Collecting Parameter</B></P> ! <P>The parser allows the use of a collecting parameter in two modes</P> ! <UL> ! <LI>a direct call to <I>extractAllNodesThatAre()</I></LI> ! <LI>Node.collectInto() during external iteration</LI></UL> ! <P>Either way, nodes are collected into the collecting parameter object if they satisfy a match criterion (usually the type).</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:40:12 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,30 ---- ! <html><head><title>Collecting Parameter</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Collecting Parameter ! ! <p>The parser allows the use of a collecting parameter in two modes ! ! <ul> ! ! <li>a direct call to <i>extractAllNodesThatAre() ! ! <li>Node.collectInto() during external iteration ! ! ! <p>Either way, nodes are collected into the collecting parameter object if they satisfy a match criterion (usually the type). ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 23, 2003 5:40:12 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: CompositePattern.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/CompositePattern.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** CompositePattern.html 24 Aug 2003 18:44:10 -0000 1.3 --- CompositePattern.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,4 **** ! <html><head><title>Composite Pattern</title></head><body><DIV class="wikitext"> ! <P><B>Composite Pattern</B></P> ! <P>The Composite can be seen in action in the <I><SPAN class="wikiunknown"><U>CompositeTag</U></SPAN></I> class. All tags that can have children subclass <I><SPAN class="wikiunknown"><U>CompositeTag</U></SPAN></I>, which contains methods for iterating over these children in a uniform way. A <SPAN class="wikiunknown"><U>CompositeTag</U></SPAN> can be composed of leaf nodes or <I><SPAN class="wikiunknown"><U>CompositeTag</U></SPAN></I>s.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 16, 2003 4:52:03 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,21 ---- ! <html><head><title>Composite Pattern</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Composite Pattern ! ! <p>The Composite can be seen in action in the <i><span class="wikiunknown"><u>CompositeTag class. All tags that can have children subclass <i><span class="wikiunknown"><u>CompositeTag, which contains methods for iterating over these children in a uniform way. A <span class="wikiunknown"><u>CompositeTag can be composed of leaf nodes or <i><span class="wikiunknown"><u>CompositeTags. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 16, 2003 4:52:03 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: CustomTagExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/CustomTagExtraction.html,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** CustomTagExtraction.html 24 Aug 2003 18:44:10 -0000 1.5 --- CustomTagExtraction.html 26 Oct 2003 19:46:17 -0000 1.6 *************** *** 1,6 **** ! <html><head><title>Custom Tag Extraction</title></head><body><DIV class="wikitext"> ! <P><B>Custom Tag Extraction</B></P> ! <P>Custom tag extraction is easy. Simply create an array of tag names that you want to extract from a page, and pass it in to <A class="wiki" HREF="TagFindingVisitor.html">TagFindingVisitor</A>, like so :</P> ! <PRE>Parser parser = new Parser(..); String [] tagsToBeFound = {"P","BR","MYTAG"}; TagFindingVisitor visitor = new TagFindingVisitor(tagsToBeFound); --- 1,13 ---- ! <html><head><title>Custom Tag Extraction</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Custom Tag Extraction ! ! <p>Custom tag extraction is easy. Simply create an array of tag names that you want to extract from a page, and pass it in to <a HREF=TagFindingVisitor.html class="wiki">TagFindingVisitor</a>, like so : ! ! <pre> ! Parser parser = new Parser(..); String [] tagsToBeFound = {"P","BR","MYTAG"}; TagFindingVisitor visitor = new TagFindingVisitor(tagsToBeFound); *************** *** 11,14 **** Node [] allBRTags = visitor.getTags(1); // Third tag specified in search ! Node [] allMyTags = visitor.getTags(2);</PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A>// Just a test of wiki</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Wednesday, April 2, 2003 1:38:24 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 18,34 ---- Node [] allBRTags = visitor.getTags(1); // Third tag specified in search ! Node [] allMyTags = visitor.getTags(2); ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! // Just a test of wiki ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Wednesday, April 2, 2003 1:38:24 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: EmailExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/EmailExtraction.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** EmailExtraction.html 24 Aug 2003 18:44:10 -0000 1.3 --- EmailExtraction.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>Email Extraction</title></head><body><DIV class="wikitext"> ! <P><B>Email Extraction</B></P> ! <P>This is very similar to link extraction. You have to extract links from a page and verify that they are email addresses. Link tags have a method - <I>isMailLink()</I></P> ! <PRE> Parser parser = new Parser(..); parser.registerScanners(); Node links [] = parser.extractAllNodesThatAre(LinkTag.class); --- 1,13 ---- ! <html><head><title>Email Extraction</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Email Extraction ! ! <p>This is very similar to link extraction. You have to extract links from a page and verify that they are email addresses. Link tags have a method - <i>isMailLink() ! ! <pre> ! Parser parser = new Parser(..); parser.registerScanners(); Node links [] = parser.extractAllNodesThatAre(LinkTag.class); *************** *** 11,14 **** System.out.println("Email address: "+linkTag.getLink()); } ! }</PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A>, February 16, 2003 11:41 am</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:24:25 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 18,33 ---- System.out.println("Email address: "+linkTag.getLink()); } ! } ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a>, February 16, 2003 11:41 am ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 23, 2003 5:24:25 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: EnableFeedback.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/EnableFeedback.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** EnableFeedback.html 24 Aug 2003 18:44:10 -0000 1.3 --- EnableFeedback.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>Enable Feedback</title></head><body><DIV class="wikitext"> ! <P><B>Enable Feedback</B></P> ! <P>If the parser needs to be switched to normal or debug mode, you can do this like so:</P> ! <PRE> Parser parser = new Parser( "http://...", --- 1,13 ---- ! <html><head><title>Enable Feedback</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Enable Feedback ! ! <p>If the parser needs to be switched to normal or debug mode, you can do this like so: ! ! <pre> ! Parser parser = new Parser( "http://...", *************** *** 17,21 **** ) ); ! </PRE> ! <P>You can also turn the feedback to QUIET mode (none of the events will be triggered), to get extra details. Check <A class="wiki" HREF="BlockFeedback.html">BlockFeedback</A>. To handle the feedback yourself, without displaying it to standard output, subclass <SPAN class="wikiunknown"><U>ParserFeedback</U></SPAN>, and override <I>info()</I>, <I>warning()</I> and <I>error()</I>.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:41:24 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 24,41 ---- ) ); ! ! ! <p>You can also turn the feedback to QUIET mode (none of the events will be triggered), to get extra details. Check <a HREF=BlockFeedback.html class="wiki">BlockFeedback</a>. To handle the feedback yourself, without displaying it to standard output, subclass <span class="wikiunknown"><u>ParserFeedback, and override <i>info(), <i>warning() and <i>error(). ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 23, 2003 5:41:24 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: ExternalIterators.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ExternalIterators.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** ExternalIterators.html 24 Aug 2003 18:44:10 -0000 1.3 --- ExternalIterators.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>External Iterators</title></head><body><DIV class="wikitext"> ! <P><B>External Iterators</B></P> ! <P>You can use external iterators to drive the entire parsing process like so :</P> ! <PRE> for (NodeIterator i = parser.elements();i.hasMoreNodes();) { Node node = e.nextNode(); if (node instanceof LinkTag) { --- 1,13 ---- ! <html><head><title>External Iterators</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>External Iterators ! ! <p>You can use external iterators to drive the entire parsing process like so : ! ! <pre> ! for (NodeIterator i = parser.elements();i.hasMoreNodes();) { Node node = e.nextNode(); if (node instanceof LinkTag) { *************** *** 8,12 **** if (node instanceof ImageTag) { } ! }</PRE> ! <P>You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:36:09 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 15,32 ---- if (node instanceof ImageTag) { } ! } ! ! <p>You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 23, 2003 5:36:09 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: FactoryMethod.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FactoryMethod.html,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** FactoryMethod.html 24 Aug 2003 18:44:10 -0000 1.4 --- FactoryMethod.html 26 Oct 2003 19:46:17 -0000 1.5 *************** *** 1,9 **** ! <html><head><title>Factory Method</title></head><body><DIV class="wikitext"> ! <P><B>Factory Method</B></P> ! <P><I><A class="wiki" HREF="TagScanner.html">TagScanner</A></I> possess an FM for the creation of a tag.</P> ! <PRE> protected Tag createTag(TagData tagData);</PRE> ! <P>Scanner subclasses override this to specify the type of tag to be constructed.</P> ! <P><I><SPAN class="wikiunknown"><U>CompositeTagScanner</U></SPAN></I> possesses an FM for the creation of a tag.</P> ! <PRE> protected Tag createTag(TagData tagData,CompositeTagData compositeTagData);</PRE> ! <P>Composite scanners override this to specify the type of tag to be constructed.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:37:36 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,33 ---- ! <html><head><title>Factory Method</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Factory Method ! ! <p><i><a HREF=TagScanner.html class="wiki">TagScanner</a> possess an FM for the creation of a tag. ! ! <pre> ! protected Tag createTag(TagData tagData); ! ! <p>Scanner subclasses override this to specify the type of tag to be constructed. ! ! <p><i><span class="wikiunknown"><u>CompositeTagScanner possesses an FM for the creation of a tag. ! ! <pre> ! protected Tag createTag(TagData tagData,CompositeTagData compositeTagData); ! ! <p>Composite scanners override this to specify the type of tag to be constructed. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 23, 2003 5:37:36 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: FeedbackMechanism.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FeedbackMechanism.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** FeedbackMechanism.html 24 Aug 2003 18:44:10 -0000 1.3 --- FeedbackMechanism.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,18 **** ! <html><head><title>Feedback Mechanism</title></head><body><DIV class="wikitext"> ! <P><B>Feedback Mechanism</B></P> ! <P>The parser has a feedback mechanism that allows you to obtain feedback about the parsing process. You can get to know if there were any errors, or any warnings, or any general information. Warnings occur when the parser has encountered dirty html, but was able to fix it and continue. Errors occur when the parser was not able to handle the html.</P> ! <P>An understanding of the feedback mechanism is useful if you wish to perform logging, or turn off the default feedback and incorporate your own.</P> ! <P>When you create a parser object without specifying any feedback object, the parser creates a default feedback object - DefaultHTMLParserFeedback. This works in three modes - NORMAL, QUIET and DEBUG, and when no feedback object is specified, it defaults to normal. In this mode, all information, warnings and errors are sent to standard output.</P> ! <PRE>HTMLParser parser = new HTMLParser(someUrl);</PRE> ! <P>The above code snippet shows the default configuration - the feedback object is created in the normal mode. You can turn off the messages by turning the feedback mechanism to the quiet mode. This can be done in two ways :</P> ! <PRE>HTMLParser parser = new HTMLParser(someUrl,null); ! Java2html</PRE> ! <P>or</P> ! <PRE>HTMLParser parser = new HTMLParser(someUrl,new DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.QUIET));</PRE> ! <P>In this mode, there is no feedback on standard output. ! For debugging purposes, you can use the debug mode to receive all stack traces of exceptions that are thrown.</P> ! <PRE>HTMLParser parser = new HTMLParser(someUrl,new DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG));</PRE> ! <P>If you wish to add a file logger- you can write your own custom feedback class like this :</P> ! <PRE>public class FileFeedback implements HTMLParserFeedback{ public FileFeedback(String file) { // .. Initialize the file for logging --- 1,39 ---- ! <html><head><title>Feedback Mechanism</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Feedback Mechanism ! ! <p>The parser has a feedback mechanism that allows you to obtain feedback about the parsing process. You can get to know if there were any errors, or any warnings, or any general information. Warnings occur when the parser has encountered dirty html, but was able to fix it and continue. Errors occur when the parser was not able to handle the html. ! ! <p>An understanding of the feedback mechanism is useful if you wish to perform logging, or turn off the default feedback and incorporate your own. ! ! <p>When you create a parser object without specifying any feedback object, the parser creates a default feedback object - DefaultHTMLParserFeedback. This works in three modes - NORMAL, QUIET and DEBUG, and when no feedback object is specified, it defaults to normal. In this mode, all information, warnings and errors are sent to standard output. ! ! <pre> ! HTMLParser parser = new HTMLParser(someUrl); ! ! <p>The above code snippet shows the default configuration - the feedback object is created in the normal mode. You can turn off the messages by turning the feedback mechanism to the quiet mode. This can be done in two ways : ! ! <pre> ! HTMLParser parser = new HTMLParser(someUrl,null); ! Java2html ! ! <p>or ! ! <pre> ! HTMLParser parser = new HTMLParser(someUrl,new DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.QUIET)); ! ! <p>In this mode, there is no feedback on standard output. ! For debugging purposes, you can use the debug mode to receive all stack traces of exceptions that are thrown. ! ! <pre> ! HTMLParser parser = new HTMLParser(someUrl,new DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG)); ! ! <p>If you wish to add a file logger- you can write your own custom feedback class like this : ! ! <pre> ! public class FileFeedback implements HTMLParserFeedback{ public FileFeedback(String file) { // .. Initialize the file for logging *************** *** 27,30 **** // .. log the error message } ! }</PRE> ! <P>You can supply an object of this type to the parser in the constructor, and accordingly channel the feedback.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, March 21, 2003 11:51:12 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 48,63 ---- // .. log the error message } ! } ! ! <p>You can supply an object of this type to the parser in the constructor, and accordingly channel the feedback. ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Friday, March 21, 2003 11:51:12 am. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: FirstName.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FirstName.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** FirstName.html 24 Aug 2003 18:44:10 -0000 1.2 --- FirstName.html 26 Oct 2003 19:46:17 -0000 1.3 *************** *** 1,2 **** ! <html><head><title>First Name</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="FirstName.html">FirstName</A> here.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, July 17, 2003 4:35:59 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,17 ---- ! <html><head><title>First Name</title></head><body> ! ! ! ! <div class="wikitext"> ! <p>Describe <a HREF=FirstName.html class="wiki">FirstName</a> here. ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Thursday, July 17, 2003 4:35:59 am. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: FrequentlyAskedQuestions.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FrequentlyAskedQuestions.html,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** FrequentlyAskedQuestions.html 24 Aug 2003 18:44:10 -0000 1.5 --- FrequentlyAskedQuestions.html 26 Oct 2003 19:46:17 -0000 1.6 *************** *** 1,7 **** ! <html><head><title>Frequently Asked Questions</title></head><body><DIV class="wikitext"> ! <P><B>FAQ</B></P><HR/> ! <P><B>How does the parser deal with tags like <tag/> ?</B></P> ! <P>The parser handles them as a normal Tag object. The Tag class has a method - isEmptyXmlTag() which can be queried to find if this an empty xml tag.</P><HR/> ! <P><B>How does the parser deal with HTML tags which should be terminated with /> but are not, i.e. <BR/> and <HR>? Is there any way to automatically know that some HTML tags are empty?</B></P><HR/> ! <P><B>How is JSP parsed using the HTMLParser?</B></P><HR/> ! <P><B>How do you find the byte offset from the beginning of a document for a tag?</B></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, June 19, 2003 10:49:11 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,32 ---- ! <html><head><title>Frequently Asked Questions</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>FAQ ! ! <hr /> ! <p><b>How does the parser deal with tags like <tag/> ? ! ! <p>The parser handles them as a normal Tag object. The Tag class has a method - isEmptyXmlTag() which can be queried to find if this an empty xml tag. ! ! <hr /> ! <p><b>How does the parser deal with HTML tags which should be terminated with /> but are not, i.e. ! <br /> and <HR>? Is there any way to automatically know that some HTML tags are empty? ! ! <hr /> ! <p><b>How is JSP parsed using the HTMLParser? ! ! <hr /> ! <p><b>How do you find the byte offset from the beginning of a document for a tag? ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Thursday, June 19, 2003 10:49:11 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: FullName.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FullName.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** FullName.html 24 Aug 2003 18:44:10 -0000 1.1 --- FullName.html 26 Oct 2003 19:46:17 -0000 1.2 *************** *** 1,2 **** ! <html><head><title>Full Name</title></head><body><DIV class="wikitext"> ! <P>Describe [<SPAN class="wikiunknown"><U>FullNa</U></SPAN></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, August 15, 2003 10:24:19 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,17 ---- ! <html><head><title>Full Name</title></head><body> ! ! ! ! <div class="wikitext"> ! <p>Describe [<span class="wikiunknown"><u>FullNa ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Friday, August 15, 2003 10:24:19 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: ImageExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ImageExtraction.html,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** ImageExtraction.html 24 Aug 2003 18:44:10 -0000 1.4 --- ImageExtraction.html 26 Oct 2003 19:46:17 -0000 1.5 *************** *** 1,7 **** ! <html><head><title>Image Extraction</title></head><body><DIV class="wikitext"> ! <P><B>Image Extractions</B></P> ! <P>This is very similar to <A class="wiki" HREF="LinkExtraction.html">LinkExtraction</A>.</P> ! <P>1. Use the <I><SPAN class="wikiunknown"><U>ObjectFindingVisitor</U></SPAN></I> like so :</P> ! <PRE>Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children // Recursion is needed only if you register all scanners, and a link tag could be embedded --- 1,15 ---- ! <html><head><title>Image Extraction</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Image Extractions ! ! <p>This is very similar to <a HREF=LinkExtraction.html class="wiki">LinkExtraction</a>. ! ! <p>1. Use the <i><span class="wikiunknown"><u>ObjectFindingVisitor like so : ! ! <pre> ! Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children // Recursion is needed only if you register all scanners, and a link tag could be embedded *************** *** 19,25 **** ImageTag imageTag = (ImageTag)images[i]; System.out.println(imageTag.getImageLocation()); ! }</PRE> ! <P>2: Use <I>extractAllNodesThatAre()</I></P> ! <PRE> Parser parser = new Parser("http://urlIWantToParse.com"); parser.registerScanners(); // Instead of registering all scanners, --- 27,36 ---- ImageTag imageTag = (ImageTag)images[i]; System.out.println(imageTag.getImageLocation()); ! } ! ! <p>2: Use <i>extractAllNodesThatAre() ! ! <pre> ! Parser parser = new Parser("http://urlIWantToParse.com"); parser.registerScanners(); // Instead of registering all scanners, *************** *** 30,33 **** ImageTag imageTag = (ImageTag)images[i]; System.out.println(imageTag.getImageLocation()); ! }</PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A>, Sunday, February 16, 2003 2:02:18 pm.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Wednesday, June 25, 2003 9:11:46 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 41,56 ---- ImageTag imageTag = (ImageTag)images[i]; System.out.println(imageTag.getImageLocation()); ! } ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a>, Sunday, February 16, 2003 2:02:18 pm. ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Wednesday, June 25, 2003 9:11:46 am. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: InternalIterators.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/InternalIterators.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** InternalIterators.html 24 Aug 2003 18:44:10 -0000 1.3 --- InternalIterators.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,4 **** ! <html><head><title>Internal Iterators</title></head><body><DIV class="wikitext"> ! <P><B>Internal Iterators</B></P> ! <P>You can use internal iterators by overriding trigger methods that you're interested in. This is done by subclassing HTMLVisitor. An example can be found in <A class="wiki" HREF="LinkExtraction.html">LinkExtraction</A>.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 16, 2003 4:08:46 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,21 ---- ! <html><head><title>Internal Iterators</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Internal Iterators ! ! <p>You can use internal iterators by overriding trigger methods that you're interested in. This is done by subclassing HTMLVisitor. An example can be found in <a HREF=LinkExtraction.html class="wiki">LinkExtraction</a>. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 16, 2003 4:08:46 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: IteratorPattern.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/IteratorPattern.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** IteratorPattern.html 24 Aug 2003 18:44:10 -0000 1.3 --- IteratorPattern.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>Iterator Pattern</title></head><body><DIV class="wikitext"> ! <P><B>Iterator Pattern</B></P> ! <P>The Iterator can be seen in action in two of its flavors - <A class="wiki" HREF="ExternalIterators.html">ExternalIterators</A>, and <A class="wiki" HREF="InternalIterators.html">InternalIterators</A>. ! The <I>HTMLEnumeration</I> class provides the external iteration facility. ! <I><SPAN class="wikiunknown"><U>SimpleEnumeration</U></SPAN></I> allows external iteration over <I><SPAN class="wikiunknown"><U>NodeList</U></SPAN></I>s.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 16, 2003 5:04:10 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,23 ---- ! <html><head><title>Iterator Pattern</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Iterator Pattern ! ! <p>The Iterator can be seen in action in two of its flavors - <a HREF=ExternalIterators.html class="wiki">ExternalIterators</a>, and <a HREF=InternalIterators.html class="wiki">InternalIterators</a>. ! The <i>HTMLEnumeration class provides the external iteration facility. ! <i><span class="wikiunknown"><u>SimpleEnumeration allows external iteration over <i><span class="wikiunknown"><u>NodeLists. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Sunday, February 16, 2003 5:04:10 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: JavaBeans.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/JavaBeans.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** JavaBeans.html 24 Aug 2003 18:44:10 -0000 1.3 --- JavaBeans.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,19 **** ! <html><head><title>Java Beans</title></head><body><DIV class="wikitext"> ! <P><B>Quick Introduction</B></P> ! <P>Run the example program that demonstrates the capabilities of the Java Beans that are already included in the htmparser.jar (it's assumed that the htmlparser.jar file from an integration build 1.3 later than April 12, 2003 is in your current directory):</P> ! <PRE>java -classpath htmlparser.jar org.htmlparser.beans.BeanyBaby</PRE> ! <P>What you should see is a split window showing a URL extraction with a list of links on the left and the text on the right.<BR/></P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/BeanyBaby.jpg" SRC="images/BeanyBaby.jpg" class="inlineimage"><BR/></P> ! <P>The splitter on the left contains a GUI oriented <TT>HTMLLinkBean</TT> (which uses an underlying API <TT>LinkBean</TT>) and the splitter on the right contains a GUI oriented <TT>HTMLStringBean</TT> (which uses an underlying API <TT>StringBean</TT>).<BR/></P> ! <P>Type in a URL or double-click a URL from the list. Use the Go menu to go back to a previous link or step to the next link you already visited.</P> ! <P>The options menu provides access to the binary properties:<BR/></P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/BeanyBabyOptions.jpg" SRC="images/BeanyBabyOptions.jpg" class="inlineimage"><BR/></P> ! <UL> ! <LI>Links - turn on and off the extraction of hyperlinks with the text</LI> ! <LI>Collapse - turn on and off collapsing whitespace</LI> ! <LI>Non-Breaking Spaces - turn on and off transforming non-break spaces into regular spaces</LI></UL> ! <P><B>Simple Usage</B></P> ! <P>The simplest operation (this shows StringBean use, but LinkBean use is similar) is just to create a new one, set the URL and then get the text:<BR/></P> ! <PRE>#import org.htmlparser.beans.StringBean; public class TryBeans --- 1,47 ---- ! <html><head><title>Java Beans</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Quick Introduction ! ! <p>Run the example program that demonstrates the capabilities of the Java Beans that are already included in the htmparser.jar (it's assumed that the htmlparser.jar file from an integration build 1.3 later than April 12, 2003 is in your current directory): ! ! <pre> ! java -classpath htmlparser.jar org.htmlparser.beans.BeanyBaby ! ! <p>What you should see is a split window showing a URL extraction with a list of links on the left and the text on the right. ! <br /> ! ! <p><img SRC="images/BeanyBaby.jpg" alt="http://htmlparser.sourceforge.net/images/BeanyBaby.jpg" class="inlineimage" /> ! <br /> ! ! <p>The splitter on the left contains a GUI oriented <tt>HTMLLinkBean (which uses an underlying API <tt>LinkBean) and the splitter on the right contains a GUI oriented <tt>HTMLStringBean (which uses an underlying API <tt>StringBean). ! <br /> ! ! <p>Type in a URL or double-click a URL from the list. Use the Go menu to go back to a previous link or step to the next link you already visited. ! ! <p>The options menu provides access to the binary properties: ! <br /> ! ! <p><img SRC="images/BeanyBabyOptions.jpg" alt="http://htmlparser.sourceforge.net/images/BeanyBabyOptions.jpg" class="inlineimage" /> ! <br /> ! ! <ul> ! ! <li>Links - turn on and off the extraction of hyperlinks with the text ! ! <li>Collapse - turn on and off collapsing whitespace ! ! <li>Non-Breaking Spaces - turn on and off transforming non-break spaces into regular spaces ! ! ! <p><b>Simple Usage ! ! <p>The simplest operation (this shows StringBean use, but LinkBean use is similar) is just to create a new one, set the URL and then get the text: ! <br /> ! ! <pre> ! #import org.htmlparser.beans.StringBean; public class TryBeans *************** *** 22,45 **** { StringBean sb = new StringBean (); ! sb.setURL ("<A class="namedurl" href="http://cbc.ca"><SPAN style="white-space: nowrap">http://cbc.ca</SPAN></A>"); System.out.println (sb.getStrings ()); } ! }</PRE> ! <P>Save this in a file called TryBeans.java and then run the following commands:</P> ! <PRE>javac -classpath htmlparser.jar TryBeans.java ! java -classpath htmlparser.jar:. TryBeans</PRE> ! <P>or for Windows:</P> ! <PRE>java -classpath htmlparser.jar;. TryBeans</PRE> ! <P><B>Simple GUI Usage</B></P> ! <P>The following instructions are for the <A class="namedurl" href="http://www.netbeans.org"><SPAN style="white-space: nowrap">NetBeans</SPAN></A> IDE but other environments will have a similar operation.</P> ! <P>You can mount the htmlparser.jar file:<BR/></P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/Mount.jpg" SRC="images/Mount.jpg" class="inlineimage"><BR/></P> ! <P>and use the bean classes directly or if you want to use them in the Form designer you'll need to install them. Use the Install New Javabean menu item in the Tools menu:<BR/></P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/InstallBean.jpg" SRC="images/InstallBean.jpg" class="inlineimage"><BR/></P> ! <P>There are a number of beans in the jar, as indicated above the GUI beans are the HTMLStringBean and HTMLLinkBean. You can install them all, but it might clutter up your palette a bit, so I would recomend only install the ones you need for the project at hand. You'll also need to specify the palette that the beans will be added to:</P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/ChooseBean.jpg" SRC="images/ChooseBean.jpg" class="inlineimage"><IMG alt="http://htmlparser.sourceforge.net/images/ChoosePalette.jpg" SRC="images/ChoosePalette.jpg" class="inlineimage"><BR/></P> ! <P>Once the bean is installed it will show up on the tool palette and you can click it and drop it onto a JFrame or JPanel or whatever:<BR/></P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/AddingBean.jpg" SRC="images/AddingBean.jpg" class="inlineimage"><BR/></P> ! <P>Once it's in your designer you can set the properties and have it display the text even while designing (assuming you're online):<BR/></P> ! <P><IMG alt="http://htmlparser.sourceforge.net/images/SettingProperties.jpg" SRC="images/SettingProperties.jpg" class="inlineimage"><BR/></P> ! <P>Of course you can subclass the provided beans or write your own.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Saturday, April 5, 2003 7:25:55 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 50,113 ---- { StringBean sb = new StringBean (); ! sb.setURL ("<a href="http://cbc.ca" class="namedurl"><span style="white-space: nowrap">http://cbc.ca</span></a>"); System.out.println (sb.getStrings ()); } ! } ! ! <p>Save this in a file called TryBeans.java and then run the following commands: ! ! <pre> ! javac -classpath htmlparser.jar TryBeans.java ! java -classpath htmlparser.jar:. TryBeans ! ! <p>or for Windows: ! ! <pre> ! java -classpath htmlparser.jar;. TryBeans ! ! <p><b>Simple GUI Usage ! ! <p>The following instructions are for the <a href="http://www.netbeans.org" class="namedurl"><span style="white-space: nowrap">NetBeans</span></a> IDE but other environments will have a similar operation. ! ! <p>You can mount the htmlparser.jar file: ! <br /> ! ! <p><img SRC="images/Mount.jpg" alt="http://htmlparser.sourceforge.net/images/Mount.jpg" class="inlineimage" /> ! <br /> ! ! <p>and use the bean classes directly or if you want to use them in the Form designer you'll need to install them. Use the Install New Javabean menu item in the Tools menu: ! <br /> ! ! <p><img SRC="images/InstallBean.jpg" alt="http://htmlparser.sourceforge.net/images/InstallBean.jpg" class="inlineimage" /> ! <br /> ! ! <p>There are a number of beans in the jar, as indicated above the GUI beans are the HTMLStringBean and HTMLLinkBean. You can install them all, but it might clutter up your palette a bit, so I would recomend only install the ones you need for the project at hand. You'll also need to specify the palette that the beans will be added to: ! ! <p><img SRC="images/ChooseBean.jpg" alt="http://htmlparser.sourceforge.net/images/ChooseBean.jpg" class="inlineimage" /> ! <img SRC="images/ChoosePalette.jpg" alt="http://htmlparser.sourceforge.net/images/ChoosePalette.jpg" class="inlineimage" /> ! <br /> ! ! <p>Once the bean is installed it will show up on the tool palette and you can click it and drop it onto a JFrame or JPanel or whatever: ! <br /> ! ! <p><img SRC="images/AddingBean.jpg" alt="http://htmlparser.sourceforge.net/images/AddingBean.jpg" class="inlineimage" /> ! <br /> ! ! <p>Once it's in your designer you can set the properties and have it display the text even while designing (assuming you're online): ! <br /> ! ! <p><img SRC="images/SettingProperties.jpg" alt="http://htmlparser.sourceforge.net/images/SettingProperties.jpg" class="inlineimage" /> ! <br /> ! ! <p>Of course you can subclass the provided beans or write your own. ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Saturday, April 5, 2003 7:25:55 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: LastName.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/LastName.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** LastName.html 24 Aug 2003 18:44:10 -0000 1.2 --- LastName.html 26 Oct 2003 19:46:17 -0000 1.3 *************** *** 1,2 **** ! <html><head><title>Last Name</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="LastName.html">LastName</A> here.fdsadfsafdsaf</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, July 17, 2003 4:38:05 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,17 ---- ! <html><head><title>Last Name</title></head><body> ! ! ! ! <div class="wikitext"> ! <p>Describe <a HREF=LastName.html class="wiki">LastName</a> here.fdsadfsafdsaf ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Thursday, July 17, 2003 4:38:05 am. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: LinkExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/LinkExtraction.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** LinkExtraction.html 24 Aug 2003 18:44:10 -0000 1.3 --- LinkExtraction.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,7 **** ! <html><head><title>Link Extraction</title></head><body><DIV class="wikitext"> ! <P><B>Link Extraction</B></P> ! <P>There are many ways of extracting links.</P> ! <P>1. Use the <SPAN class="wikiunknown"><U>ObjectFindingVisitor</U></SPAN> to extract links, like so:</P> ! <PRE> Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children // Recursion is needed only if you register all scanners, and a link tag could be embedded --- 1,15 ---- ! <html><head><title>Link Extraction</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Link Extraction ! ! <p>There are many ways of extracting links. ! ! <p>1. Use the <span class="wikiunknown"><u>ObjectFindingVisitor to extract links, like so: ! ! <pre> ! Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children // Recursion is needed only if you register all scanners, and a link tag could be embedded *************** *** 20,26 **** System.out.println(linkTag.getLink()); System.out.println(linkTag.getLinkText()); ! }</PRE> ! <P>2. Use the parser utility method - extractAllNodesThatAre().</P> ! <PRE> Parser parser = new Parser("http://urlIWantToParse.com"); parser.registerScanners(); Node [] links = parser.extractAllNodesThatAre(LinkTag.class); --- 28,37 ---- System.out.println(linkTag.getLink()); System.out.println(linkTag.getLinkText()); ! } ! ! <p>2. Use the parser utility method - extractAllNodesThatAre(). ! ! <pre> ! Parser parser = new Parser("http://urlIWantToParse.com"); parser.registerScanners(); Node [] links = parser.extractAllNodesThatAre(LinkTag.class); *************** *** 31,37 **** System.out.println(linkTag.getLink()); System.out.println(linkTag.getLinkText()); ! }</PRE> ! <P>3. It is possible that you are interested in extracting more than just links. In order to customize extraction, write your own visitor. Extend the Visitor class (in the package org.htmlparser.visitors - Parser v1.3 upwards) like so :</P> ! <PRE> public class MyCustomizedVisitor extends Visitor { public MyCustomizedVisitor(Parser parser) { super(true); /// Its usually a good idea to perform recursion --- 42,51 ---- System.out.println(linkTag.getLink()); System.out.println(linkTag.getLinkText()); ! } ! ! <p>3. It is possible that you are interested in extracting more than just links. In order to customize extraction, write your own visitor. Extend the Visitor class (in the package org.htmlparser.visitors - Parser v1.3 upwards) like so : ! ! <pre> ! public class MyCustomizedVisitor extends Visitor { public MyCustomizedVisitor(Parser parser) { super(true); /// Its usually a good idea to perform recursion *************** *** 78,84 **** In your app.. Parser parser = new Parser(...); ! MyCustomizedVisitor visitor = new MyCustomizedVisitor(); parser.visitAllNodesWith(visitor); // You can now get the data from the visitor interface. ! </PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:22:44 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 92,110 ---- In your app.. Parser parser = new Parser(...); ! MyCustomizedVisitor visitor = new MyCustomizedVisitor(parser); parser.visitAllNodesWith(visitor); // You can now get the data from the visitor interface. ! ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Tuesday, September 2, 2003 1:59:15 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: ParserDesign.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ParserDesign.html,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** ParserDesign.html 24 Aug 2003 18:44:10 -0000 1.4 --- ParserDesign.html 26 Oct 2003 19:46:17 -0000 1.5 *************** *** 1,6 **** ! <html><head><title>Parser Design</title></head><body><DIV class="wikitext"> ! <P><B>Parser Design</B></P> ! <P>HTMLParser is a SAX-like parser streaming parser, that has the capability to correct dirty-html on the fly. It is extremely fast and lightweight. The binary distribution of the jar file is around 135 KB only, and it can easily be brought down to 65 KB for a minimal parsing requirement (prior to optimization and obfuscation).</P> ! <P>It is also extensible. The parser provides both <A class="wiki" HREF="InternalIterators.html">InternalIterators</A> and <A class="wiki" HREF="ExternalIterators.html">ExternalIterators</A>. ! The parser has some interesting <A class="wiki" HREF="PatternStories.html">PatternStories</A>..</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Monday, March 17, 2003 6:18:45 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,24 ---- ! <html><head><title>Parser Design</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Parser Design ! ! <p>HTMLParser is a SAX-like parser streaming parser, that has the capability to correct dirty-html on the fly. It is extremely fast and lightweight. The binary distribution of the jar file is around 135 KB only, and it can easily be brought down to 65 KB for a minimal parsing requirement (prior to optimization and obfuscation). ! ! <p>It is also extensible. The parser provides both <a HREF=InternalIterators.html class="wiki">InternalIterators</a> and <a HREF=ExternalIterators.html class="wiki">ExternalIterators</a>. ! The parser has some interesting <a HREF=PatternStories.html class="wiki">PatternStories</a>.. ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Monday, March 17, 2003 6:18:45 am. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: ParsingXml.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ParsingXml.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** ParsingXml.html 24 Aug 2003 18:44:10 -0000 1.3 --- ParsingXml.html 26 Oct 2003 19:46:17 -0000 1.4 *************** *** 1,1813 **** ! <html><head><title>Parsing Xml</title></head><body><DIV class="wikitext"> ! <P><?xml version="1.0" encoding="iso-8859-1" ?></P><BLOCKQUOTE style="border-left-width: medium; border-left-color: #0f0; border-left-style: ridge; padding-left: 1em; margin-left: 0em; margin-right: 0em;"> ! <BLOCKQUOTE> ! <P><<SPAN class="wikiunknown"><U>ReviewerInformation</U></SPAN>></P> ! <P><Reviewer></P> ! <P><PeopleID>9</PeopleID></P> ! <P><<A class="wiki" HREF="FirstName.html">FirstName</A>>Niall</<A class="wiki" HREF="FirstName.html">FirstName</A>></P> ! <P><<A class="wiki" HREF="LastName.html">LastName</A>>Adams</<A class="wiki" HREF="LastName.html">LastName</A>></P> ! <P><<A class="wiki" HREF="FullName.html">FullName</A>>Niall Adams</<A class="wiki" HREF="FullName.html">FullName</A>></P> ! <P><Organization>Imperial College</Organization></P> [...5429 lines suppressed...] ! ! <p><Fax>509-479-4522</Fax> ! ! <p></Reviewer> ! ! <p></<span class="wikiunknown"><u>ReviewerInformation> ! ! ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Tuesday, June 24, 2003 1:32:51 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: PatternStories.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/PatternStories.html,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** PatternStories.html 24 Aug 2003 18:44:10 -0000 1.4 --- PatternStories.html 26 Oct 2003 19:46:17 -0000 1.5 *************** *** 1,12 **** ! <html><head><title>Pattern Stories</title></head><body><DIV class="wikitext"> ! <P><B>Pattern Stories</B></P> ! <P>The parser uses the following patterns:</P> ! <UL> ! <LI><A class="wiki" HREF="FactoryMethod.html">FactoryMethod</A></LI> ! <LI><A class="wiki" HREF="TemplateMethod.html">TemplateMethod</A></LI> ! <LI><A class="wiki" HREF="IteratorPattern.html">IteratorPattern</A></LI> ! <LI><A class="wiki" HREF="VisitorPattern.html">VisitorPattern</A></LI> ! <LI><A class="wiki" HREF="CollectingParameter.html">CollectingParameter</A></LI> ! <LI><A class="wiki" HREF="StrategyPattern.html">StrategyPattern</A></LI> ! <LI><A class="wiki" HREF="CompositePattern.html">CompositePattern</A></LI></UL> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, May 16, 2003 2:30:12 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file --- 1,38 ---- ! <html><head><title>Pattern Stories</title></head><body> ! ! ! ! <div class="wikitext"> ! <p><b>Pattern Stories ! ! <p>The parser uses the following patterns: ! ! <ul> ! ! <li><a HREF=FactoryMethod.html class="wiki">FactoryMethod</a> ! ! <li><a HREF=TemplateMethod.html class="wiki">TemplateMethod</a> ! ! <li><a HREF=IteratorPattern.html class="wiki">IteratorPattern</a> ! ! <li><a HREF=VisitorPattern.html class="wiki">VisitorPattern</a> ! ! <li><a HREF=CollectingParameter.html class="wiki">CollectingParameter</a> ! ! <li><a HREF=StrategyPattern.html class="wiki">StrategyPattern</a> ! ! <li><a HREF=CompositePattern.html class="wiki">CompositePattern</a> ! ! ! <p>--<a HREF=SomikRaha.html class="wiki">SomikRaha</a> ! ! ! ! <div id="actionbar" class="toolbar"> ! ! <hr class="printer" noshade="noshade" /> ! ! <p class="editdate">Last edited on Friday, May 16, 2003 2:30:12 pm. ! ! <hr class="toolbar" noshade="noshade" /> ! </body></html> \ No newline at end of file Index: PostOperation.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/PostOperation.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** PostOperation.html 24 Aug 2003 18:44:10 -0000 1.2 --- PostOperation.html 26 Oct 2003 19:46:17 -0000 1.3 *************** *** 1,18 **** ! <html><head><title>Post Operation</title></head><body><DIV class="wikitext"> ! <H4>POST Operation</H4> ! <P>The standard HTTP request submitted by the parser is a GET. This note describes how to use POST, which is the usual request submitted by a form.</P> ! <P>As an example, we'll submit a form to the U.S. postal service web site.<BR/><I>Note: This is suboptimal, the postal service provides tools for this type of thing: <A class="namedurl" href="http://www.uspswebtools.com"><SPAN style="white-space: nowrap">http://www.uspswebtools.com</SPAN></A></I><BR/></P> ! <P>On the USPS web site, the page <A class="namedurl" href="http://www.usps.com/zip4/citytown.htm"><SPAN style="white-space: nowrap">http://www.usps.com/zip4/citytown.htm</SPAN></A> has the following FORM that asks for a zip code and returns the cities or towns covered by the zip code (only form elements are shown removing all the formatting markup):</P> ! <PRE><form NAME="frmzip" ACTION="zip_response.jsp" METHOD="post" OnSubmit="return validate(frmzip)"> <input type="text" id="zipcode" name="zipcode" size="5" maxlength="5" TABINDEX="10"> ! <input TYPE="image" NAME="Submit" SRC="/zip4/images/submit.jpg" BORDER="0" WIDTH="50" HEIGHT="17" ALT="Submit" TABINDEX="11"></PRE> ! <P>From this we determine that the <TT>METHOD</TT> is <TT>POST</TT> and the form should be submitted to <TT>zip_response.jsp</TT>. This relative URL is relative to the page it is found on, so the form should be submitted to <TT>http://www.usps.com/zip4/zip_response.jsp</TT> when the <TT>Submit</TT> input is clicked. The only <TT>input</TT> element other than the ... [truncated message content] |