htmlparser-cvs Mailing List for HTML Parser (Page 45)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv12126/src/org/htmlparser Modified Files: AbstractNode.java Node.java NodeReader.java Parser.java RemarkNode.java RemarkNodeParser.java StringNode.java package.html Log Message: Update version headers to 1.4-20030824 and update changelog. Index: AbstractNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/AbstractNode.java,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** AbstractNode.java 24 Aug 2003 19:40:17 -0000 1.9 --- AbstractNode.java 24 Aug 2003 21:59:41 -0000 1.10 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: Node.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** Node.java 24 Aug 2003 19:40:17 -0000 1.35 --- Node.java 24 Aug 2003 21:59:41 -0000 1.36 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: NodeReader.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/NodeReader.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** NodeReader.java 11 Aug 2003 00:18:28 -0000 1.39 --- NodeReader.java 24 Aug 2003 21:59:41 -0000 1.40 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: Parser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v retrieving revision 1.55 retrieving revision 1.56 diff -C2 -d -r1.55 -r1.56 *** Parser.java 24 Aug 2003 19:40:17 -0000 1.55 --- Parser.java 24 Aug 2003 21:59:41 -0000 1.56 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // *************** *** 157,161 **** */ public final static String ! VERSION_DATE = "Aug 10, 2003" ; --- 157,161 ---- */ public final static String ! VERSION_DATE = "Aug 24, 2003" ; Index: RemarkNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/RemarkNode.java,v retrieving revision 1.26 retrieving revision 1.27 diff -C2 -d -r1.26 -r1.27 *** RemarkNode.java 24 Aug 2003 19:40:17 -0000 1.26 --- RemarkNode.java 24 Aug 2003 21:59:41 -0000 1.27 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: RemarkNodeParser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/RemarkNodeParser.java,v retrieving revision 1.26 retrieving revision 1.27 diff -C2 -d -r1.26 -r1.27 *** RemarkNodeParser.java 11 Aug 2003 00:18:28 -0000 1.26 --- RemarkNodeParser.java 24 Aug 2003 21:59:41 -0000 1.27 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: StringNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/StringNode.java,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** StringNode.java 24 Aug 2003 20:49:44 -0000 1.34 --- StringNode.java 24 Aug 2003 21:59:41 -0000 1.35 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: package.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/package.html,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** package.html 11 Aug 2003 00:18:28 -0000 1.12 --- package.html 24 Aug 2003 21:59:41 -0000 1.13 *************** *** 6,10 **** @(#)package.html 1.60 98/01/27 ! HTMLParser Library v1_4_20030810 - A java-based parser for HTML Copyright (C) Dec 31, 2000 Somik Raha --- 6,10 ---- @(#)package.html 1.60 98/01/27 ! HTMLParser Library v1_4_20030824 - A java-based parser for HTML Copyright (C) Dec 31, 2000 Somik Raha |
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans In directory sc8-pr-cvs1:/tmp/cvs-serv12126/src/org/htmlparser/beans Modified Files: BeanyBaby.java HTMLLinkBean.java HTMLTextBean.java LinkBean.java StringBean.java package.html Log Message: Update version headers to 1.4-20030824 and update changelog. Index: BeanyBaby.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/BeanyBaby.java,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** BeanyBaby.java 11 Aug 2003 00:18:28 -0000 1.12 --- BeanyBaby.java 24 Aug 2003 21:59:41 -0000 1.13 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: HTMLLinkBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/HTMLLinkBean.java,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** HTMLLinkBean.java 24 Aug 2003 19:40:17 -0000 1.13 --- HTMLLinkBean.java 24 Aug 2003 21:59:41 -0000 1.14 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: HTMLTextBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/HTMLTextBean.java,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** HTMLTextBean.java 24 Aug 2003 19:40:17 -0000 1.14 --- HTMLTextBean.java 24 Aug 2003 21:59:41 -0000 1.15 *************** *** 1,3 **** ! /// HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! /// HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: LinkBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/LinkBean.java,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** LinkBean.java 24 Aug 2003 19:40:18 -0000 1.17 --- LinkBean.java 24 Aug 2003 21:59:41 -0000 1.18 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: StringBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** StringBean.java 24 Aug 2003 20:30:55 -0000 1.23 --- StringBean.java 24 Aug 2003 21:59:41 -0000 1.24 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030824 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: package.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/package.html,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** package.html 11 Aug 2003 00:18:28 -0000 1.11 --- package.html 24 Aug 2003 21:59:41 -0000 1.12 *************** *** 6,10 **** @(#)package.html 1.60 98/01/27 ! HTMLParser Library v1_4_20030810 - A java-based parser for HTML Copyright (C) Dec 31, 2000 Somik Raha --- 6,10 ---- @(#)package.html 1.60 98/01/27 ! HTMLParser Library v1_4_20030824 - A java-based parser for HTML Copyright (C) Dec 31, 2000 Somik Raha |
From: <der...@us...> - 2003-08-24 22:00:14
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1:/tmp/cvs-serv12126/docs Modified Files: changes.txt release.txt Log Message: Update version headers to 1.4-20030824 and update changelog. Index: changes.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/changes.txt,v retrieving revision 1.187 retrieving revision 1.188 diff -C2 -d -r1.187 -r1.188 *** changes.txt 11 Aug 2003 00:18:28 -0000 1.187 --- changes.txt 24 Aug 2003 21:59:41 -0000 1.188 *************** *** 13,16 **** --- 13,275 ---- ******************************************************************************* + Integration Build 1.4 - 20030824 + -------------------------------- + + 2003-08-24 17:07 derrickoswald + + * build.xml: + + Update groups list. + + 2003-08-24 16:49 derrickoswald + + * src/org/htmlparser/: StringNode.java, scanners/TagScanner.java, + tags/AppletTag.java, tags/CompositeTag.java, tags/DoctypeTag.java, + tags/EndTag.java, tags/JspTag.java, tags/ScriptTag.java, + tags/StyleTag.java, tags/Tag.java: + + Cure javadoc warnings about invalid parameters and broken links, take two. + + 2003-08-24 16:30 derrickoswald + + * src/org/htmlparser/beans/StringBean.java: + + Fix extra carriage returns in text output. + + 2003-08-24 15:40 derrickoswald + + * src/org/htmlparser/: AbstractNode.java, Node.java, Parser.java, + RemarkNode.java, beans/HTMLLinkBean.java, beans/HTMLTextBean.java, + beans/LinkBean.java, beans/StringBean.java, + parserHelper/AttributeParser.java, scanners/FormScanner.java, + scanners/FrameScanner.java, scanners/ImageScanner.java, + tags/Tag.java: + + Cure javadoc warnings about invalid parameters and broken links. + + 2003-08-24 14:54 somik + + * src/org/htmlparser/visitors/CompositeTagFindingVisitor.java: + + removed unused class + + 2003-08-24 14:53 somik + + * src/org/htmlparser/util/ParserUtils.java: + + removed dead code + + 2003-08-24 14:50 somik + + * src/org/htmlparser/util/: Generate.java, IteratorImpl.java, + LinkProcessor.java: + + removed unused local variables + + 2003-08-24 14:47 somik + + * src/org/htmlparser/tests/utilTests/: BeanTest.java, + NodeListTest.java: + + removed unused imports + + 2003-08-24 14:46 somik + + * src/org/htmlparser/tests/: tagTests/TagTest.java, + temporaryFailures/AttributeParserTest.java, + utilTests/HTMLLinkProcessorTest.java: + + removed unused local variables + + 2003-08-24 14:44 somik + + * src/org/htmlparser/tests/scannersTests/TableScannerTest.java: + + reformatted + + 2003-08-24 14:44 derrickoswald + + * build.xml, docs/docs/BlockFeedback.html, + docs/docs/CollectingParameter.html, + docs/docs/CompositePattern.html, + docs/docs/CustomTagExtraction.html, docs/docs/EmailExtraction.html, + docs/docs/EnableFeedback.html, docs/docs/ExternalIterators.html, + docs/docs/FactoryMethod.html, docs/docs/FeedbackMechanism.html, + docs/docs/FirstName.html, docs/docs/FrequentlyAskedQuestions.html, + docs/docs/FullName.html, docs/docs/ImageExtraction.html, + docs/docs/InternalIterators.html, docs/docs/IteratorPattern.html, + docs/docs/JavaBeans.html, docs/docs/LastName.html, + docs/docs/LinkExtraction.html, docs/docs/ParserDesign.html, + docs/docs/ParsingXml.html, docs/docs/PatternStories.html, + docs/docs/PostOperation.html, docs/docs/ReverseHtml.html, + docs/docs/SamplePrograms.html, docs/docs/SearchingForData.html, + docs/docs/SomikRaha.html, docs/docs/StrategyPattern.html, + docs/docs/StringExtraction.html, docs/docs/TagFindingVisitor.html, + docs/docs/TagScanner.html, docs/docs/TemplateMethod.html, + docs/docs/TestDrivenDevelopment.html, + docs/docs/TextExtractingVisitor.html, + docs/docs/UnitTestingPdf.html, docs/docs/UnitTestingXsl.html, + docs/docs/UsingCookiesWithParser.html, + docs/docs/VisitorPattern.html, docs/docs/WebCrawler.html, + docs/docs/WebRipper.html, docs/docs/WritingYourOwnScanners.html, + docs/docs/index.html: + + update Wiki image + + 2003-08-24 14:43 somik + + * src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java: + + removed unused assertion + + 2003-08-24 14:42 somik + + * src/org/htmlparser/tests/scannersTests/: HeadScannerTest.java, + ImageScannerTest.java, TagScannerTest.java: + + removed unused local variables + + 2003-08-24 14:39 somik + + * src/org/htmlparser/tests/: ParserTest.java, ParserTestCase.java, + utilTests/BeanTest.java, + scannersTests/CompositeTagScannerTest.java: + + removed unused local variables + + 2003-08-24 14:35 somik + + * src/org/htmlparser/tests/BadTagIdentifier.java: + + improved identify() + + 2003-08-24 14:34 somik + + * src/org/htmlparser/scanners/FrameScanner.java: + + reformatted + + 2003-08-24 14:32 somik + + * src/org/htmlparser/: lexer/nodes/TagNode.java, + scanners/TagScanner.java: + + removed unused local variables + + 2003-08-24 14:31 somik + + * src/org/htmlparser/lexer/Page.java: + + removed unused imports and variables + + 2003-08-24 14:30 somik + + * src/org/htmlparser/: lexer/Lexer.java, + nodeDecorators/AbstractNodeDecorator.java: + + removed unused imports + + 2003-08-24 14:30 somik + + * src/org/htmlparser/beans/: LinkBean.java, StringBean.java: + + removed unused private variables + + 2003-08-24 14:28 somik + + * src/: fit/Attributes.java, org/htmlparser/Parser.java: + + updated fit test + + 2003-08-23 13:14 derrickoswald + + * src/org/htmlparser/: AbstractNode.java, Node.java, + RemarkNode.java, StringNode.java, lexer/Lexer.java, + lexer/Page.java, lexer/nodes/RemarkNode.java, + lexer/nodes/StringNode.java, lexer/nodes/TagNode.java, + nodeDecorators/AbstractNodeDecorator.java, + parserHelper/SpecialHashtable.java, tags/CompositeTag.java, + tags/LinkTag.java, tags/SelectTag.java, tags/Tag.java, + tests/scannersTests/CompositeTagScannerTest.java, + tests/tagTests/CompositeTagTest.java, + tests/utilTests/NodeListTest.java, + tests/visitorsTests/AllTests.java, util/ChainedException.java, + util/NodeList.java: + + Sixth drop for new i/o subsystem. + Isolated htmllexer.jar file and made it compileable and runnable on JDK 1.1 systems. + The build.xml file now has four new targets for separate compiling and jaring of the lexer and parser. + Significantly refactored the existing Node interface and AbstractNode class to achieve isolation. + They now support get/setChildren(), rather than CompositeTag. + Various scanners that were directly accessing the childTags node list were affected. + The get/setParent is now a generic Node rather than a CompositeTag. + The visitor accept() signature was changed to Object to avoid dragging in visitors code. + This was *not* changed on classes derived from Tag, although it could be. + ChainedException now uses/returns a Vector. + Removed the cruft from lexer nodes where possible. + + 2003-08-22 21:33 derrickoswald + + * build.xml, src/org/htmlparser/lexer/Lexer.java, + src/org/htmlparser/lexer/Page.java, + src/org/htmlparser/lexer/nodes/Attribute.java, + resources/Manifest.mf, resources/lexer, resources/runLexer.bat, + src/ExceptionMessages_en_US.properties, + src/ExceptionMessages_ja_JP.properties, src/Manifest.mf: + + Fifth drop for new i/o subsystem. + There is now a mainline for the lexer. + Try: + java -jar htmllexer.jar http://whatever + or the integration build has a new lexer execution script: + bin/lexer http://whatever + + 2003-08-20 21:52 derrickoswald + + * src/org/htmlparser/: lexer/Cursor.java, lexer/Lexer.java, + lexer/Page.java, lexer/PageIndex.java, lexer/Source.java, + lexer/nodes/Attribute.java, lexer/nodes/TagNode.java, + tests/lexerTests/AllTests.java, tests/lexerTests/KitTest.java, + tests/lexerTests/LexerTests.java: + + Fourth drop for new i/o subsystem. + + 2003-08-17 12:09 derrickoswald + + * src/org/htmlparser/: lexer/Cursor.java, lexer/Lexer.java, + lexer/Page.java, lexer/PageIndex.java, lexer/Source.java, + lexer/package.html, tests/lexerTests/AllTests.java, + tests/lexerTests/KitTest.java, tests/lexerTests/LexerTests.java, + tests/lexerTests/PageIndexTests.java, + tests/lexerTests/PageTests.java, tests/lexerTests/SourceTests.java, + lexer/nodes/AbstractNode.java, lexer/nodes/Attribute.java, + lexer/nodes/RemarkNode.java, lexer/nodes/StringNode.java, + lexer/nodes/TagNode.java, lexer/nodes/package.html: + + Third drop for new i/o subsystem. + + 2003-08-15 16:51 derrickoswald + + * src/org/htmlparser/: parserHelper/AttributeParser.java, + scanners/TagScanner.java, tags/FormTag.java, tags/ImageTag.java, + tags/LinkTag.java, tags/Tag.java, tests/ParserTestCase.java, + tests/scannersTests/ScriptScannerTest.java, + tests/tagTests/InputTagTest.java, tests/tagTests/TagTest.java, + tests/temporaryFailures/AttributeParserTest.java: + + Case maintaining toHtml() output for tag attributes. + With these changes, the output of toHtml() now reflects the upper/lower case values + of the input for the contents of tags, i.e. attribute names maintain their original case. + They're still out of order from how they are parsed, but this is a first step. + Rather than adjust all the test cases right now, the ParserTestCase assertSameString() + method now checks a global flag to see if case matters when comparing strings. + As of this drop it ignores case when comparing HTML output. This will soon change. + + 2003-08-10 23:53 derrickoswald + + * build.xml: + + Move libs to correct level in distribution zip. + Integration Build 1.4 - 20030810 -------------------------------- Index: release.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/release.txt,v retrieving revision 1.46 retrieving revision 1.47 diff -C2 -d -r1.46 -r1.47 *** release.txt 27 Jul 2003 19:19:16 -0000 1.46 --- release.txt 24 Aug 2003 21:59:41 -0000 1.47 *************** *** 1,3 **** ! HTMLParser Version 1.4 (Integration Build Jul 27, 2003) ********************************************* --- 1,3 ---- ! HTMLParser Version 1.4 (Integration Build Aug 24, 2003) ********************************************* *************** *** 10,14 **** A1. The distribution contains : ! (i) binary jar file - htmlparser.jar (in lib directory) (ii) source code - src.zip (in distribution directory) --- 10,14 ---- A1. The distribution contains : ! (i) binary jar files - htmlparser.jar and lexer.jar (in lib directory) (ii) source code - src.zip (in distribution directory) *************** *** 22,27 **** (a) runParser.bat : Runs the html parser (b) runCrawler.bat : Runs the robot crawler ! (c) runRipper.bat : Runs the mail ripper ! All three batch files assume that java 1.2 (or upwards) is visible in your path. Issue the following command : --- 22,29 ---- (a) runParser.bat : Runs the html parser (b) runCrawler.bat : Runs the robot crawler ! (c) runRipper.bat : Runs the mail ripper ! (d) runLexer.bat : Runs the low lever lexer ! (e) lexer : Runs the low lever lexer on linux/unix ! All four batch files assume that java 1.2 (or upwards) is visible in your path. Issue the following command : |
From: <der...@us...> - 2003-08-24 22:00:13
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv12126 Modified Files: build.xml Log Message: Update version headers to 1.4-20030824 and update changelog. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** build.xml 24 Aug 2003 21:07:23 -0000 1.44 --- build.xml 24 Aug 2003 21:59:40 -0000 1.45 *************** *** 389,392 **** --- 389,393 ---- <copy todir="${dist}/bin" > <fileset dir="${resources}" includes="*.bat"/> + <fileset dir="${resources}" includes="lexer"/> </copy> </target> |
From: <der...@us...> - 2003-08-24 21:07:26
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv4501 Modified Files: build.xml Log Message: Update groups list. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** build.xml 24 Aug 2003 18:44:10 -0000 1.43 --- build.xml 24 Aug 2003 21:07:23 -0000 1.44 *************** *** 358,365 **** <group title="Example Applications" packages="org.htmlparser.parserapplications"/> <group title="Tags" packages="org.htmlparser.tags,org.htmlparser.tags.data"/> <group title="Scanners" packages="org.htmlparser.scanners"/> <group title="Beans" packages="org.htmlparser.beans"/> <group title="Visitors" packages="org.htmlparser.visitors"/> ! <group title="Utility Packages (of developer interest only)" packages="org.htmlparser.util"/> </javadoc> </target> --- 358,366 ---- <group title="Example Applications" packages="org.htmlparser.parserapplications"/> <group title="Tags" packages="org.htmlparser.tags,org.htmlparser.tags.data"/> + <group title="Lexer" packages="org.htmlparser.lexer,org.htmlparser.lexer.nodes"/> <group title="Scanners" packages="org.htmlparser.scanners"/> <group title="Beans" packages="org.htmlparser.beans"/> <group title="Visitors" packages="org.htmlparser.visitors"/> ! <group title="Utility Packages (of developer interest only)" packages="org.htmlparser.util,org.htmlparser.util.sort"/> </javadoc> </target> |
From: <der...@us...> - 2003-08-24 21:01:52
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans In directory sc8-pr-cvs1:/tmp/cvs-serv24669/beans Modified Files: HTMLLinkBean.java HTMLTextBean.java LinkBean.java StringBean.java Log Message: Cure javadoc warnings about invalid parameters and broken links. Index: HTMLLinkBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/HTMLLinkBean.java,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** HTMLLinkBean.java 11 Aug 2003 00:18:28 -0000 1.12 --- HTMLLinkBean.java 24 Aug 2003 19:40:17 -0000 1.13 *************** *** 105,109 **** * This removes a PropertyChangeListener that was registered for all properties. * <p><em>Delegates to the underlying StringBean</em> ! * @param the PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) --- 105,109 ---- * This removes a PropertyChangeListener that was registered for all properties. * <p><em>Delegates to the underlying StringBean</em> ! * @param listener The PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) *************** *** 158,162 **** /** * Setter for property Connection. ! * @param url New value of property Connection. */ public void setConnection (URLConnection connection) --- 158,162 ---- /** * Setter for property Connection. ! * @param connection New value of property Connection. */ public void setConnection (URLConnection connection) Index: HTMLTextBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/HTMLTextBean.java,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** HTMLTextBean.java 11 Aug 2003 00:18:28 -0000 1.13 --- HTMLTextBean.java 24 Aug 2003 19:40:17 -0000 1.14 *************** *** 91,95 **** * This removes a PropertyChangeListener that was registered for all properties. * <p><em>Delegates to the underlying StringBean</em> ! * @param the PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) --- 91,95 ---- * This removes a PropertyChangeListener that was registered for all properties. * <p><em>Delegates to the underlying StringBean</em> ! * @param listener The PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) *************** *** 226,230 **** /** * Setter for property Connection. ! * @param url New value of property Connection. */ public void setConnection (URLConnection connection) --- 226,230 ---- /** * Setter for property Connection. ! * @param connection New value of property Connection. */ public void setConnection (URLConnection connection) Index: LinkBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/LinkBean.java,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** LinkBean.java 24 Aug 2003 18:30:14 -0000 1.16 --- LinkBean.java 24 Aug 2003 19:40:18 -0000 1.17 *************** *** 160,164 **** * Remove a PropertyChangeListener from the listener list. * This removes a PropertyChangeListener that was registered for all properties. ! * @param the PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) --- 160,164 ---- * Remove a PropertyChangeListener from the listener list. * This removes a PropertyChangeListener that was registered for all properties. ! * @param listener The PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) *************** *** 263,267 **** /** * Setter for property Connection. ! * @param url New value of property Connection. */ public void setConnection (URLConnection connection) --- 263,267 ---- /** * Setter for property Connection. ! * @param connection New value of property Connection. */ public void setConnection (URLConnection connection) Index: StringBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** StringBean.java 24 Aug 2003 18:30:14 -0000 1.21 --- StringBean.java 24 Aug 2003 19:40:18 -0000 1.22 *************** *** 179,183 **** * Appends a newline to the buffer if there isn't one there already. * Except if the buffer is empty. - * @param buffer The buffer to append to. */ protected void carriage_return () --- 179,182 ---- *************** *** 356,360 **** * Remove a PropertyChangeListener from the listener list. * This removes a PropertyChangeListener that was registered for all properties. ! * @param the PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) --- 355,359 ---- * Remove a PropertyChangeListener from the listener list. * This removes a PropertyChangeListener that was registered for all properties. ! * @param listener The PropertyChangeListener to be removed. */ public void removePropertyChangeListener (PropertyChangeListener listener) |
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags In directory sc8-pr-cvs1:/tmp/cvs-serv1898/tags Modified Files: AppletTag.java CompositeTag.java DoctypeTag.java EndTag.java JspTag.java ScriptTag.java StyleTag.java Tag.java Log Message: Cure javadoc warnings about invalid parameters and broken links, take two. Index: AppletTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/AppletTag.java,v retrieving revision 1.22 retrieving revision 1.23 diff -C2 -d -r1.22 -r1.23 *** AppletTag.java 11 Aug 2003 00:18:30 -0000 1.22 --- AppletTag.java 24 Aug 2003 20:49:44 -0000 1.23 *************** *** 49,56 **** /** * HTMLAppletTag constructor comment. ! * @param nodeBegin int ! * @param nodeEnd int ! * @param tagContents java.lang.String ! * @param tagLine java.lang.String */ public AppletTag(TagData tagData,CompositeTagData compositeTagData) --- 49,54 ---- /** * HTMLAppletTag constructor comment. ! * @param tagData The data for this tag. ! * @param compositeTagData The data for this composite tag. */ public AppletTag(TagData tagData,CompositeTagData compositeTagData) Index: CompositeTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/CompositeTag.java,v retrieving revision 1.49 retrieving revision 1.50 diff -C2 -d -r1.49 -r1.50 *** CompositeTag.java 23 Aug 2003 17:14:45 -0000 1.49 --- CompositeTag.java 24 Aug 2003 20:49:44 -0000 1.50 *************** *** 170,174 **** * </code> * @param searchString search criterion ! * @param caseSensitivie specify whether this search should be case * sensitive * @return NodeList Collection of nodes whose string contents or --- 170,174 ---- * </code> * @param searchString search criterion ! * @param caseSensitive specify whether this search should be case * sensitive * @return NodeList Collection of nodes whose string contents or *************** *** 246,251 **** * again. Note that the position is at a linear level alone - there * is no recursion in this method. ! * @param text ! * @return int */ public int findPositionOf(Node searchNode) { --- 246,251 ---- * again. Note that the position is at a linear level alone - there * is no recursion in this method. ! * @param searchNode The child node to find. ! * @return The offset of the child tag or -1 if it was not found. */ public int findPositionOf(Node searchNode) { Index: DoctypeTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/DoctypeTag.java,v retrieving revision 1.22 retrieving revision 1.23 diff -C2 -d -r1.22 -r1.23 *** DoctypeTag.java 11 Aug 2003 00:18:30 -0000 1.22 --- DoctypeTag.java 24 Aug 2003 20:49:44 -0000 1.23 *************** *** 37,46 **** public class DoctypeTag extends Tag { ! /** * The HTMLDoctypeTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param nodeBegin beginning position of the tag ! * @param nodeEnd ending position of the tag ! * @param tagContents contents of the remark tag */ public DoctypeTag(TagData tagData) --- 37,44 ---- public class DoctypeTag extends Tag { ! /** * The HTMLDoctypeTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param tagData The data for this tag. */ public DoctypeTag(TagData tagData) Index: EndTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/EndTag.java,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** EndTag.java 11 Aug 2003 00:18:30 -0000 1.25 --- EndTag.java 24 Aug 2003 20:49:44 -0000 1.26 *************** *** 45,51 **** /** * Constructor takes 3 arguments to construct an EndTag object. ! * @param nodeBegin Beginning position of the end tag ! * @param nodeEnd Ending position of the end tag ! * @param tagContents Text contents of the tag */ public EndTag(TagData tagData) --- 45,49 ---- /** * Constructor takes 3 arguments to construct an EndTag object. ! * @param tagData The data for this tag. */ public EndTag(TagData tagData) Index: JspTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/JspTag.java,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** JspTag.java 11 Aug 2003 00:18:30 -0000 1.23 --- JspTag.java 24 Aug 2003 20:49:44 -0000 1.24 *************** *** 39,45 **** * The HTMLJspTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param nodeBegin beginning position of the tag ! * @param nodeEnd ending position of the tag ! * @param tagContents contents of the remark tag */ public JspTag(TagData tagData) --- 39,43 ---- * The HTMLJspTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param tagData The data for this tag. */ public JspTag(TagData tagData) Index: ScriptTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/ScriptTag.java,v retrieving revision 1.22 retrieving revision 1.23 diff -C2 -d -r1.22 -r1.23 *** ScriptTag.java 11 Aug 2003 00:18:30 -0000 1.22 --- ScriptTag.java 24 Aug 2003 20:49:44 -0000 1.23 *************** *** 42,52 **** * The HTMLScriptTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param nodeBegin beginning position of the tag ! * @param nodeEnd ending position of the tag ! * @param tagContents The contents of the Script Tag (should be kept the same as that of the original Tag contents) ! * @param scriptCode The Javascript code b/w the tags ! * @param language The language parameter ! * @param type The type parameter ! * @param tagLine The current line being parsed, where the tag was found */ public ScriptTag(TagData tagData,CompositeTagData compositeTagData) --- 42,47 ---- * The HTMLScriptTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param tagData The data for this tag. ! * @param compositeTagData The data for this composite tag. */ public ScriptTag(TagData tagData,CompositeTagData compositeTagData) Index: StyleTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/StyleTag.java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** StyleTag.java 11 Aug 2003 00:18:30 -0000 1.21 --- StyleTag.java 24 Aug 2003 20:49:44 -0000 1.22 *************** *** 39,46 **** * The HTMLStyleTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param nodeBegin beginning position of the tag ! * @param nodeEnd ending position of the tag ! * @param styleCode The style code b/w the tags ! * @param tagLine The current line being parsed, where the tag was found */ public StyleTag(TagData tagData,CompositeTagData compositeTagData) { --- 39,44 ---- * The HTMLStyleTag is constructed by providing the beginning posn, ending posn * and the tag contents. ! * @param tagData The data for this tag. ! * @param compositeTagData The data for this composite tag. */ public StyleTag(TagData tagData,CompositeTagData compositeTagData) { Index: Tag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/Tag.java,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** Tag.java 24 Aug 2003 19:40:18 -0000 1.42 --- Tag.java 24 Aug 2003 20:49:44 -0000 1.43 *************** *** 389,394 **** /** ! * Sets the nodeBegin. ! * @param nodeBegin The nodeBegin to set */ public void setTagBegin(int tagBegin) { --- 389,394 ---- /** ! * Sets the tagBegin. ! * @param tagBegin The starting position of the tag. */ public void setTagBegin(int tagBegin) { *************** *** 397,402 **** /** ! * Gets the nodeBegin. ! * @return The nodeBegin value. */ public int getTagBegin() { --- 397,402 ---- /** ! * Gets the tagBegin. ! * @return The nstarting position of the tag. */ public int getTagBegin() { *************** *** 405,410 **** /** ! * Sets the nodeEnd. ! * @param nodeEnd The nodeEnd to set */ public void setTagEnd(int tagEnd) { --- 405,410 ---- /** ! * Sets the tagEnd. ! * @param tagEnd The ending position of the tag. */ public void setTagEnd(int tagEnd) { *************** *** 413,418 **** /** ! * Gets the nodeEnd. ! * @return The nodeEnd value. */ public int getTagEnd() { --- 413,418 ---- /** ! * Gets the tagEnd. ! * @return The ending position of the tag. */ public int getTagEnd() { |
From: <der...@us...> - 2003-08-24 20:49:47
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners In directory sc8-pr-cvs1:/tmp/cvs-serv1898/scanners Modified Files: TagScanner.java Log Message: Cure javadoc warnings about invalid parameters and broken links, take two. Index: TagScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/TagScanner.java,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** TagScanner.java 24 Aug 2003 18:34:25 -0000 1.34 --- TagScanner.java 24 Aug 2003 20:49:44 -0000 1.35 *************** *** 134,138 **** * the scanner id does not imply a match (or extra processing needs to be done). * Default returns true</strong> ! * @param s The complete text contents of the Tag. * @param previousOpenScanner Indicates any previous scanner which hasnt completed, before the current * scan has begun, and hence allows us to write scanners that can work with dirty html --- 134,138 ---- * the scanner id does not imply a match (or extra processing needs to be done). * Default returns true</strong> ! * @param tagContents The complete text contents of the Tag. * @param previousOpenScanner Indicates any previous scanner which hasnt completed, before the current * scan has begun, and hence allows us to write scanners that can work with dirty html |
From: <der...@us...> - 2003-08-24 20:49:47
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv1898 Modified Files: StringNode.java Log Message: Cure javadoc warnings about invalid parameters and broken links, take two. Index: StringNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/StringNode.java,v retrieving revision 1.33 retrieving revision 1.34 diff -C2 -d -r1.33 -r1.34 *** StringNode.java 23 Aug 2003 17:14:44 -0000 1.33 --- StringNode.java 24 Aug 2003 20:49:44 -0000 1.34 *************** *** 51,58 **** * @param textEnd The ending positiong of the string */ ! public StringNode(StringBuffer textBuffer,int textBegin,int textEnd) { super(textBegin,textEnd); ! this.textBuffer = textBuffer; } --- 51,58 ---- * @param textEnd The ending positiong of the string */ ! public StringNode (StringBuffer text, int textBegin,int textEnd) { super(textBegin,textEnd); ! this.textBuffer = text; } *************** *** 65,69 **** /** * Sets the string contents of the node. ! * @param The new text for the node. */ public void setText(String text) --- 65,69 ---- /** * Sets the string contents of the node. ! * @param text The new text for the node. */ public void setText(String text) |
From: <der...@us...> - 2003-08-24 20:30:58
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans In directory sc8-pr-cvs1:/tmp/cvs-serv31843 Modified Files: StringBean.java Log Message: Fix extra carriage returns in text output. Index: StringBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v retrieving revision 1.22 retrieving revision 1.23 diff -C2 -d -r1.22 -r1.23 *** StringBean.java 24 Aug 2003 19:40:18 -0000 1.22 --- StringBean.java 24 Aug 2003 20:30:55 -0000 1.23 *************** *** 612,617 **** else if (name.equalsIgnoreCase ("SCRIPT")) mIsScript = false; - if (end.breaksFlow ()) - carriage_return (); } --- 612,615 ---- |
From: <der...@us...> - 2003-08-24 19:40:20
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv24669 Modified Files: AbstractNode.java Node.java Parser.java RemarkNode.java Log Message: Cure javadoc warnings about invalid parameters and broken links. Index: AbstractNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/AbstractNode.java,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** AbstractNode.java 23 Aug 2003 17:14:44 -0000 1.8 --- AbstractNode.java 24 Aug 2003 19:40:17 -0000 1.9 *************** *** 233,237 **** /** * Sets the string contents of the node. ! * @param The new text for the node. */ public void setText(String text) { --- 233,237 ---- /** * Sets the string contents of the node. ! * @param text The new text for the node. */ public void setText(String text) { Index: Node.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** Node.java 23 Aug 2003 17:14:44 -0000 1.34 --- Node.java 24 Aug 2003 19:40:17 -0000 1.35 *************** *** 163,167 **** /** * Sets the string contents of the node. ! * @param The new text for the node. */ public void setText(String text); --- 163,167 ---- /** * Sets the string contents of the node. ! * @param text The new text for the node. */ public void setText(String text); Index: Parser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v retrieving revision 1.54 retrieving revision 1.55 diff -C2 -d -r1.54 -r1.55 *** Parser.java 24 Aug 2003 18:29:18 -0000 1.54 --- Parser.java 24 Aug 2003 19:40:17 -0000 1.55 *************** *** 249,253 **** /** ! * @param lineSeparator New Line separator to be used */ public static void setLineSeparator(String lineSeparatorString) --- 249,253 ---- /** ! * @param lineSeparatorString New Line separator to be used */ public static void setLineSeparator(String lineSeparatorString) Index: RemarkNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/RemarkNode.java,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** RemarkNode.java 23 Aug 2003 17:14:44 -0000 1.25 --- RemarkNode.java 24 Aug 2003 19:40:17 -0000 1.26 *************** *** 50,58 **** * @param nodeEnd ending position of the tag * @param tagContents contents of the remark tag - * @param tagLine The current line being parsed, where the tag was found */ ! public RemarkNode(int tagBegin, int tagEnd, String tagContents) { ! super(tagBegin,tagEnd); this.tagContents = tagContents; } --- 50,57 ---- * @param nodeEnd ending position of the tag * @param tagContents contents of the remark tag */ ! public RemarkNode(int nodeBegin, int nodeEnd, String tagContents) { ! super(nodeBegin,nodeEnd); this.tagContents = tagContents; } |
From: <so...@us...> - 2003-08-24 18:54:39
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors In directory sc8-pr-cvs1:/tmp/cvs-serv18010/src/org/htmlparser/visitors Removed Files: CompositeTagFindingVisitor.java Log Message: removed unused class --- CompositeTagFindingVisitor.java DELETED --- |
From: <so...@us...> - 2003-08-24 18:53:06
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1:/tmp/cvs-serv17833/src/org/htmlparser/util Modified Files: ParserUtils.java Log Message: removed dead code Index: ParserUtils.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v retrieving revision 1.24 retrieving revision 1.25 diff -C2 -d -r1.24 -r1.25 *** ParserUtils.java 11 Aug 2003 00:18:36 -0000 1.24 --- ParserUtils.java 24 Aug 2003 18:53:03 -0000 1.25 *************** *** 35,135 **** import org.htmlparser.Node; import org.htmlparser.NodeReader; - import org.htmlparser.scanners.TagScanner; import org.htmlparser.tags.Tag; ! public class ParserUtils ! { ! public static boolean evaluateTag(TagScanner pTagScanner, ! String pTagString, String pTagName) ! { ! pTagString = TagScanner.absorbLeadingBlanks(pTagString); ! if (pTagString.toUpperCase().indexOf(pTagName)==0) ! return true; ! else ! return false; ! } ! ! public static String toHTML(Tag tag) ! { ! StringBuffer htmlString = new StringBuffer(); ! ! Hashtable attrs = tag.getAttributes(); ! String pTagName = tag.getAttribute(Tag.TAGNAME); ! htmlString.append("<").append(pTagName); ! for (Enumeration e = attrs.keys();e.hasMoreElements();) ! { ! String key = (String)e.nextElement(); ! String value = (String)attrs.get(key); ! if (!key.equalsIgnoreCase(Tag.TAGNAME) && value.length()>0) ! htmlString.append(" ").append(key).append("=\"").append(value).append("\""); ! } ! htmlString.append(">"); ! ! return htmlString.toString(); ! } ! public static String toString(Tag tag) ! { String tagName = tag.getAttribute(Tag.TAGNAME); Hashtable attrs = tag.getAttributes(); ! StringBuffer lString = new StringBuffer(tagName); lString.append(" TAG\n"); lString.append("--------\n"); ! ! for (Enumeration e = attrs.keys();e.hasMoreElements();) ! { ! String key = (String)e.nextElement(); ! String value = (String)attrs.get(key); ! if (!key.equalsIgnoreCase(Tag.TAGNAME) && value.length()>0) lString.append(key).append(" : ").append(value).append("\n"); } ! return lString.toString(); } ! ! public static Map adjustScanners(NodeReader reader) ! { ! Map tempScanners= new Hashtable(); ! tempScanners = reader.getParser().getScanners(); // Remove all existing scanners reader.getParser().flushScanners(); return tempScanners; } ! public static void restoreScanners(NodeReader reader, Map tempScanners) ! { // Flush the scanners reader.getParser().setScanners(tempScanners); } ! public static String removeChars(String s,char occur) { ! StringBuffer newString = new StringBuffer(); ! char ch; ! for (int i=0;i<s.length();i++) { ! ch = s.charAt(i); ! if (ch!=occur) newString.append(ch); ! } ! return newString.toString(); } public static String removeEscapeCharacters(String inputString) { ! inputString = ParserUtils.removeChars(inputString,'\r'); ! inputString = ParserUtils.removeChars(inputString,'\n'); ! inputString = ParserUtils.removeChars(inputString,'\t'); return inputString; } public static String removeLeadingBlanks(String plainText) { ! while (plainText.indexOf(' ')==0) ! plainText=plainText.substring(1); return plainText; } public static String removeTrailingBlanks(String text) { ! char ch = ' ' ; ! while (ch==' '){ ! ch = text.charAt(text.length()-1); ! if (ch==' ') ! text = text.substring(0,text.length()-1); } return text; --- 35,103 ---- import org.htmlparser.Node; import org.htmlparser.NodeReader; import org.htmlparser.tags.Tag; ! public class ParserUtils { ! public static String toString(Tag tag) { String tagName = tag.getAttribute(Tag.TAGNAME); Hashtable attrs = tag.getAttributes(); ! StringBuffer lString = new StringBuffer(tagName); lString.append(" TAG\n"); lString.append("--------\n"); ! ! for (Enumeration e = attrs.keys(); e.hasMoreElements();) { ! String key = (String) e.nextElement(); ! String value = (String) attrs.get(key); ! if (!key.equalsIgnoreCase(Tag.TAGNAME) && value.length() > 0) lString.append(key).append(" : ").append(value).append("\n"); } ! return lString.toString(); } ! ! public static Map adjustScanners(NodeReader reader) { ! Map tempScanners = new Hashtable(); ! tempScanners = reader.getParser().getScanners(); // Remove all existing scanners reader.getParser().flushScanners(); return tempScanners; } ! ! public static void restoreScanners(NodeReader reader, Map tempScanners) { // Flush the scanners reader.getParser().setScanners(tempScanners); } ! public static String removeChars(String s, char occur) { ! StringBuffer newString = new StringBuffer(); ! char ch; ! for (int i = 0; i < s.length(); i++) { ! ch = s.charAt(i); ! if (ch != occur) ! newString.append(ch); ! } ! return newString.toString(); } public static String removeEscapeCharacters(String inputString) { ! inputString = ParserUtils.removeChars(inputString, '\r'); ! inputString = ParserUtils.removeChars(inputString, '\n'); ! inputString = ParserUtils.removeChars(inputString, '\t'); return inputString; } public static String removeLeadingBlanks(String plainText) { ! while (plainText.indexOf(' ') == 0) ! plainText = plainText.substring(1); return plainText; } public static String removeTrailingBlanks(String text) { ! char ch = ' '; ! while (ch == ' ') { ! ch = text.charAt(text.length() - 1); ! if (ch == ' ') ! text = text.substring(0, text.length() - 1); } return text; *************** *** 145,152 **** public static Node[] findTypeInNode(Node node, Class type) { NodeList nodeList = new NodeList(); ! node.collectInto(nodeList,type); ! Node spans [] = nodeList.toNodeArray(); return spans; ! } ! } --- 113,120 ---- public static Node[] findTypeInNode(Node node, Class type) { NodeList nodeList = new NodeList(); ! node.collectInto(nodeList, type); ! Node spans[] = nodeList.toNodeArray(); return spans; ! } ! } |
From: <so...@us...> - 2003-08-24 18:50:06
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1:/tmp/cvs-serv17148/src/org/htmlparser/util Modified Files: IteratorImpl.java LinkProcessor.java Generate.java Log Message: removed unused local variables Index: IteratorImpl.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/IteratorImpl.java,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** IteratorImpl.java 11 Aug 2003 00:18:36 -0000 1.23 --- IteratorImpl.java 24 Aug 2003 18:50:02 -0000 1.24 *************** *** 81,85 **** */ public boolean hasMoreNodes() throws ParserException { - Node node; boolean ret; --- 81,84 ---- Index: LinkProcessor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/LinkProcessor.java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** LinkProcessor.java 11 Aug 2003 00:18:36 -0000 1.21 --- LinkProcessor.java 24 Aug 2003 18:50:02 -0000 1.22 *************** *** 64,71 **** ParserException { - String path; // path portion of constructed URL - boolean modified; // true if path is modified by us - boolean absolute; // true if link starts with "/" - int index; String ret; --- 64,67 ---- *************** *** 163,168 **** * @return <code>true</code> if the resource is a valid URL. */ ! public static boolean isURL (String resourceLocn) ! { URL url; boolean ret; --- 159,163 ---- * @return <code>true</code> if the resource is a valid URL. */ ! public static boolean isURL (String resourceLocn) { URL url; boolean ret; Index: Generate.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/Generate.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** Generate.java 11 Aug 2003 00:18:36 -0000 1.37 --- Generate.java 24 Aug 2003 18:50:02 -0000 1.38 *************** *** 360,365 **** String token; String code; - int comment; - String description; if (string.startsWith ("<!--")) --- 360,363 ---- *************** *** 414,420 **** int begin; int end; - StringBuffer ret; - - ret = new StringBuffer (4096); index = 0; --- 412,415 ---- |
From: <so...@us...> - 2003-08-24 18:48:38
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1:/tmp/cvs-serv16922/src/org/htmlparser/tests/utilTests Modified Files: NodeListTest.java Log Message: removed unused imports Index: NodeListTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/NodeListTest.java,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** NodeListTest.java 23 Aug 2003 17:14:46 -0000 1.12 --- NodeListTest.java 24 Aug 2003 18:48:36 -0000 1.13 *************** *** 34,38 **** import org.htmlparser.util.NodeList; import org.htmlparser.util.SimpleNodeIterator; - import org.htmlparser.visitors.NodeVisitor; public class NodeListTest extends ParserTestCase { --- 34,37 ---- |
From: <so...@us...> - 2003-08-24 18:48:17
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1:/tmp/cvs-serv16842/src/org/htmlparser/tests/utilTests Modified Files: HTMLLinkProcessorTest.java Log Message: removed unused local variables Index: HTMLLinkProcessorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/HTMLLinkProcessorTest.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** HTMLLinkProcessorTest.java 11 Aug 2003 00:18:34 -0000 1.40 --- HTMLLinkProcessorTest.java 24 Aug 2003 18:48:14 -0000 1.41 *************** *** 57,61 **** String url = "http://htmlparser.sourceforge.net/test/This is a Test Page.html"; String fixedURL = LinkProcessor.fixSpaces(url); - int index = fixedURL.indexOf(" "); assertEquals("Expected","http://htmlparser.sourceforge.net/test/This%20is%20a%20Test%20Page.html",fixedURL); } --- 57,60 ---- |
From: <so...@us...> - 2003-08-24 18:47:51
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1:/tmp/cvs-serv16777/src/org/htmlparser/tests/utilTests Modified Files: BeanTest.java Log Message: removed unused imports Index: BeanTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/BeanTest.java,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -d -r1.36 -r1.37 *** BeanTest.java 24 Aug 2003 18:39:11 -0000 1.36 --- BeanTest.java 24 Aug 2003 18:47:48 -0000 1.37 *************** *** 42,47 **** import java.util.Vector; - import junit.framework.TestCase; - import org.htmlparser.Node; import org.htmlparser.Parser; --- 42,45 ---- |
From: <so...@us...> - 2003-08-24 18:47:23
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures In directory sc8-pr-cvs1:/tmp/cvs-serv16689/src/org/htmlparser/tests/temporaryFailures Modified Files: AttributeParserTest.java Log Message: removed unused local variables Index: AttributeParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures/AttributeParserTest.java,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** AttributeParserTest.java 15 Aug 2003 20:51:48 -0000 1.8 --- AttributeParserTest.java 24 Aug 2003 18:47:20 -0000 1.9 *************** *** 175,181 **** */ public void testJspWithinAttributes() { - Parser parser; - - parser = new Parser (); if (1.4 <= Parser.getVersionNumber ()) { --- 175,178 ---- *************** *** 197,203 **** */ public void testScriptedTag () { - Parser parser; - - parser = new Parser (); if (1.4 <= Parser.getVersionNumber ()) { --- 194,197 ---- |
From: <so...@us...> - 2003-08-24 18:46:49
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests In directory sc8-pr-cvs1:/tmp/cvs-serv16558/src/org/htmlparser/tests/tagTests Modified Files: TagTest.java Log Message: removed unused local variables Index: TagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** TagTest.java 15 Aug 2003 20:51:48 -0000 1.37 --- TagTest.java 24 Aug 2003 18:46:46 -0000 1.38 *************** *** 91,95 **** */ public void testNestedTags() throws ParserException { - EndTag etag; String s = "input type=\"text\" value=\"<%=\"test\"%>\" name=\"text\""; String line = "<"+s+">"; --- 91,94 ---- *************** *** 108,113 **** public void testParseParameter3() throws ParserException { Tag tag; - EndTag etag; - StringNode snode; Node node=null; String lin1 = "<DIV class=\"userData\" id=\"oLayout\" name=\"oLayout\"></DIV>"; --- 107,110 ---- *************** *** 115,121 **** NodeIterator en = parser.elements(); Hashtable h; ! boolean testEnd=true; // test end of first part ! String a,href,myPara,myValue,nice; ! try { --- 112,116 ---- NodeIterator en = parser.elements(); Hashtable h; ! try { *************** *** 149,154 **** NodeIterator en = parser.elements(); Hashtable h; ! boolean testEnd=true; // test end of first part ! String a,href,myPara,myValue,nice; try { --- 144,148 ---- NodeIterator en = parser.elements(); Hashtable h; ! String a,href,myValue,nice; try { *************** *** 220,225 **** NodeIterator en = parser.elements(); Hashtable h; ! boolean testEnd=true; // test end of first part ! String a,href,myPara,myValue,nice; try { --- 214,218 ---- NodeIterator en = parser.elements(); Hashtable h; ! String a,href,myValue,nice; try { *************** *** 289,294 **** NodeIterator en = parser.elements(); Hashtable h; ! boolean testEnd=true; // test end of first part ! String a,href,myPara,myValue,nice; try { --- 282,286 ---- NodeIterator en = parser.elements(); Hashtable h; ! String a,nice; try { *************** *** 392,398 **** */ public void testWithoutParseParameter() throws ParserException{ - Tag tag; - EndTag etag; - StringNode snode; Node node=null; String testHTML = "<A href=\"http://www.iki.fi/kaila\" myParameter yourParameter=\"Kaarle\">Kaarle's homepage</A><p>Paragraph</p>"; --- 384,387 ---- *************** *** 418,424 **** */ public void testEmptyTagParseParameter() throws ParserException{ - Tag tag; - EndTag etag; - StringNode snode; Node node=null; String testHTML = "<INPUT name=\"foo\" value=\"foobar\" type=\"text\" />"; --- 407,410 ---- *************** *** 481,485 **** Tag tag = (Tag)node[0]; assertStringEquals("Node contents","META NAME=\"Author\" CONTENT=\"DORIER-APPRILL E., GERVAIS-LAMBONY P., MORICONI-EBRARD F., NAVEZ-BOUCHANINE F.\"",tag.getText()); - Hashtable table = tag.getAttributes(); assertEquals("Meta Content","DORIER-APPRILL E., GERVAIS-LAMBONY P., MORICONI-EBRARD F., NAVEZ-BOUCHANINE F.",tag.getAttribute("CONTENT")); --- 467,470 ---- *************** *** 610,614 **** public void testExtractWord() { String line = "Abc DEF GHHI"; - String word = Tag.extractWord(line); assertEquals("Word expected","ABC",Tag.extractWord(line)); String line2= "%\n "; --- 595,598 ---- |
From: <so...@us...> - 2003-08-24 18:45:24
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1:/tmp/cvs-serv16321/src/org/htmlparser/tests/scannersTests Modified Files: TagScannerTest.java Log Message: removed unused local variables Index: TagScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TagScannerTest.java,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** TagScannerTest.java 11 Aug 2003 00:18:33 -0000 1.23 --- TagScannerTest.java 24 Aug 2003 18:45:21 -0000 1.24 *************** *** 110,121 **** public void testRemoveChars() { String test = "hello\nworld\n\tqsdsds"; - TagScanner scanner = new TagScanner() { - public Tag scan(Tag tag,String url,NodeReader reader,String currLine) { return null;} - public boolean evaluate(String s,TagScanner previousOpenScanner) { return false; } - public String []getID() { - - return null; - } - }; String result = ParserUtils.removeChars(test,'\n'); assertEquals("Removing Chars","helloworld\tqsdsds",result); --- 110,113 ---- |
From: <so...@us...> - 2003-08-24 18:44:54
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1:/tmp/cvs-serv16199/src/org/htmlparser/tests/scannersTests Modified Files: TableScannerTest.java Log Message: reformatted Index: TableScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TableScannerTest.java,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** TableScannerTest.java 11 Aug 2003 00:18:33 -0000 1.29 --- TableScannerTest.java 24 Aug 2003 18:44:50 -0000 1.30 *************** *** 176,184 **** public void testOverFlow () throws ParserException { ! Parser parser; ! Node node; ! ! parser = new Parser("http://www.sec.gov/Archives/edgar/data/30554/000089322002000287/w57038e10-k.htm"); parser.addScanner(new TableScanner(parser)); for (NodeIterator e = parser.elements(); e.hasMoreNodes(); ) node = e.nextNode(); --- 176,185 ---- public void testOverFlow () throws ParserException { ! Parser parser = ! new Parser( ! "http://www.sec.gov/Archives/edgar/data/30554/000089322002000287/w57038e10-k.htm" ! ); parser.addScanner(new TableScanner(parser)); + Node node; for (NodeIterator e = parser.elements(); e.hasMoreNodes(); ) node = e.nextNode(); |
Update of /cvsroot/htmlparser/htmlparser/docs/docs In directory sc8-pr-cvs1:/tmp/cvs-serv15993/htmlparser/docs/docs Modified Files: LinkExtraction.html TextExtractingVisitor.html StringExtraction.html ReverseHtml.html ExternalIterators.html InternalIterators.html IteratorPattern.html PatternStories.html EnableFeedback.html FrequentlyAskedQuestions.html StrategyPattern.html TemplateMethod.html SamplePrograms.html ParsingXml.html LastName.html ParserDesign.html FirstName.html ImageExtraction.html FeedbackMechanism.html EmailExtraction.html UnitTestingPdf.html FactoryMethod.html UsingCookiesWithParser.html TagFindingVisitor.html SomikRaha.html CollectingParameter.html SearchingForData.html WebRipper.html TestDrivenDevelopment.html BlockFeedback.html CompositePattern.html TagScanner.html JavaBeans.html PostOperation.html CustomTagExtraction.html VisitorPattern.html WebCrawler.html WritingYourOwnScanners.html index.html Added Files: FullName.html UnitTestingXsl.html Log Message: update Wiki image --- NEW FILE: FullName.html --- <html><head><title>Full Name</title></head><body><DIV class="wikitext"> <P>Describe [<SPAN class="wikiunknown"><U>FullNa</U></SPAN></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, August 15, 2003 10:24:19 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> --- NEW FILE: UnitTestingXsl.html --- <html><head><title>Unit Testing Xsl</title></head><body><DIV class="wikitext"> <P>asdfDescribe <A class="wiki" HREF="UnitTestingXsl.html">UnitTestingXsl</A> here.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Tuesday, July 29, 2003 6:18:50 am.</P><HR noshade="noshade" class="toolbar"/></body></html> Index: LinkExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/LinkExtraction.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** LinkExtraction.html 26 Apr 2003 03:58:34 -0000 1.2 --- LinkExtraction.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,6 **** ! <html><head><title>Link Extraction</title></head><body><DIV CLASS="wikitext"> <P><B>Link Extraction</B></P> <P>There are many ways of extracting links.</P> ! <P>1. Use the <SPAN CLASS="wikiunknown"><U>ObjectFindingVisitor</U></SPAN> to extract links, like so:</P> <PRE> Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children --- 1,6 ---- ! <html><head><title>Link Extraction</title></head><body><DIV class="wikitext"> <P><B>Link Extraction</B></P> <P>There are many ways of extracting links.</P> ! <P>1. Use the <SPAN class="wikiunknown"><U>ObjectFindingVisitor</U></SPAN> to extract links, like so:</P> <PRE> Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children *************** *** 82,84 **** // You can now get the data from the visitor interface. </PRE> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:22:44 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 82,84 ---- // You can now get the data from the visitor interface. </PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:22:44 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: TextExtractingVisitor.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/TextExtractingVisitor.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** TextExtractingVisitor.html 26 Apr 2003 03:58:35 -0000 1.2 --- TextExtractingVisitor.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,4 **** ! <html><head><title>Text Extracting Visitor</title></head><body><DIV CLASS="wikitext"> ! <P>Describe <A CLASS="wiki" HREF="TextExtractingVisitor.html">TextExtractingVisitor</A> here.esto es la uni como todos decimos ! es lo mejor</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Friday, March 28, 2003 12:39:55 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,4 ---- ! <html><head><title>Text Extracting Visitor</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="TextExtractingVisitor.html">TextExtractingVisitor</A> here.esto es la uni como todos decimos ! es lo mejor</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, March 28, 2003 12:39:55 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: StringExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/StringExtraction.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** StringExtraction.html 26 Apr 2003 03:58:35 -0000 1.3 --- StringExtraction.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,5 **** ! <html><head><title>String Extraction</title></head><body><DIV CLASS="wikitext"> <P><B>String Extraction</B></P> ! <P>To get all the text content from a web page, use the <A CLASS="wiki" HREF="TextExtractingVisitor.html">TextExtractingVisitor</A>, like so :</P> <PRE> Parser parser = new Parser("http://pageIwantToParse.com"); TextExtractingVisitor visitor = new TextExtractingVisitor(); --- 1,5 ---- ! <html><head><title>String Extraction</title></head><body><DIV class="wikitext"> <P><B>String Extraction</B></P> ! <P>To get all the text content from a web page, use the <A class="wiki" HREF="TextExtractingVisitor.html">TextExtractingVisitor</A>, like so :</P> <PRE> Parser parser = new Parser("http://pageIwantToParse.com"); TextExtractingVisitor visitor = new TextExtractingVisitor(); *************** *** 10,12 **** ParserUtils.removeEscapeCharacters( visitor.getExtractedText() ! );</PRE></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:20:23 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 10,12 ---- ParserUtils.removeEscapeCharacters( visitor.getExtractedText() ! );</PRE></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:20:23 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: ReverseHtml.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ReverseHtml.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ReverseHtml.html 26 Apr 2003 03:58:35 -0000 1.2 --- ReverseHtml.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Reverse Html</title></head><body><DIV CLASS="wikitext"> <P><B>Reverse Html Rendering</B></P> <P>In order to get back the html representation of a web page, you may use toHTML() recursively. Here's one way to get it:</P> --- 1,3 ---- ! <html><head><title>Reverse Html</title></head><body><DIV class="wikitext"> <P><B>Reverse Html Rendering</B></P> <P>In order to get back the html representation of a web page, you may use toHTML() recursively. Here's one way to get it:</P> *************** *** 18,21 **** <PRE>tag.setAttribute(Tag.TAGNAME,newTagName);</PRE> <P>This should enable you to perform any transformations on the html. ! Take a look at another way of modifying tags in <A CLASS="wiki" HREF="WebRipper.html">WebRipper</A>.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:34:12 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 18,21 ---- <PRE>tag.setAttribute(Tag.TAGNAME,newTagName);</PRE> <P>This should enable you to perform any transformations on the html. ! Take a look at another way of modifying tags in <A class="wiki" HREF="WebRipper.html">WebRipper</A>.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:34:12 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: ExternalIterators.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ExternalIterators.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ExternalIterators.html 26 Apr 2003 03:58:34 -0000 1.2 --- ExternalIterators.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>External Iterators</title></head><body><DIV CLASS="wikitext"> <P><B>External Iterators</B></P> <P>You can use external iterators to drive the entire parsing process like so :</P> --- 1,3 ---- ! <html><head><title>External Iterators</title></head><body><DIV class="wikitext"> <P><B>External Iterators</B></P> <P>You can use external iterators to drive the entire parsing process like so :</P> *************** *** 10,12 **** }</PRE> <P>You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:36:09 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 10,12 ---- }</PRE> <P>You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:36:09 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: InternalIterators.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/InternalIterators.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** InternalIterators.html 26 Apr 2003 03:58:34 -0000 1.2 --- InternalIterators.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,4 **** ! <html><head><title>Internal Iterators</title></head><body><DIV CLASS="wikitext"> <P><B>Internal Iterators</B></P> ! <P>You can use internal iterators by overriding trigger methods that you're interested in. This is done by subclassing HTMLVisitor. An example can be found in <A CLASS="wiki" HREF="LinkExtraction.html">LinkExtraction</A>.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 16, 2003 4:08:46 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,4 ---- ! <html><head><title>Internal Iterators</title></head><body><DIV class="wikitext"> <P><B>Internal Iterators</B></P> ! <P>You can use internal iterators by overriding trigger methods that you're interested in. This is done by subclassing HTMLVisitor. An example can be found in <A class="wiki" HREF="LinkExtraction.html">LinkExtraction</A>.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 16, 2003 4:08:46 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: IteratorPattern.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/IteratorPattern.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** IteratorPattern.html 26 Apr 2003 03:58:34 -0000 1.2 --- IteratorPattern.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,6 **** ! <html><head><title>Iterator Pattern</title></head><body><DIV CLASS="wikitext"> <P><B>Iterator Pattern</B></P> ! <P>The Iterator can be seen in action in two of its flavors - <A CLASS="wiki" HREF="ExternalIterators.html">ExternalIterators</A>, and <A CLASS="wiki" HREF="InternalIterators.html">InternalIterators</A>. The <I>HTMLEnumeration</I> class provides the external iteration facility. ! <I><SPAN CLASS="wikiunknown"><U>SimpleEnumeration</U></SPAN></I> allows external iteration over <I><SPAN CLASS="wikiunknown"><U>NodeList</U></SPAN></I>s.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 16, 2003 5:04:10 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,6 ---- ! <html><head><title>Iterator Pattern</title></head><body><DIV class="wikitext"> <P><B>Iterator Pattern</B></P> ! <P>The Iterator can be seen in action in two of its flavors - <A class="wiki" HREF="ExternalIterators.html">ExternalIterators</A>, and <A class="wiki" HREF="InternalIterators.html">InternalIterators</A>. The <I>HTMLEnumeration</I> class provides the external iteration facility. ! <I><SPAN class="wikiunknown"><U>SimpleEnumeration</U></SPAN></I> allows external iteration over <I><SPAN class="wikiunknown"><U>NodeList</U></SPAN></I>s.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 16, 2003 5:04:10 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: PatternStories.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/PatternStories.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** PatternStories.html 25 May 2003 19:30:14 -0000 1.3 --- PatternStories.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,12 **** ! <html><head><title>Pattern Stories</title></head><body><DIV CLASS="wikitext"> <P><B>Pattern Stories</B></P> <P>The parser uses the following patterns:</P> <UL> ! <LI><A CLASS="wiki" HREF="FactoryMethod.html">FactoryMethod</A></LI> ! <LI><A CLASS="wiki" HREF="TemplateMethod.html">TemplateMethod</A></LI> ! <LI><A CLASS="wiki" HREF="IteratorPattern.html">IteratorPattern</A></LI> ! <LI><A CLASS="wiki" HREF="VisitorPattern.html">VisitorPattern</A></LI> ! <LI><A CLASS="wiki" HREF="CollectingParameter.html">CollectingParameter</A></LI> ! <LI><A CLASS="wiki" HREF="StrategyPattern.html">StrategyPattern</A></LI> ! <LI><A CLASS="wiki" HREF="CompositePattern.html">CompositePattern</A></LI></UL> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Friday, May 16, 2003 2:30:12 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,12 ---- ! <html><head><title>Pattern Stories</title></head><body><DIV class="wikitext"> <P><B>Pattern Stories</B></P> <P>The parser uses the following patterns:</P> <UL> ! <LI><A class="wiki" HREF="FactoryMethod.html">FactoryMethod</A></LI> ! <LI><A class="wiki" HREF="TemplateMethod.html">TemplateMethod</A></LI> ! <LI><A class="wiki" HREF="IteratorPattern.html">IteratorPattern</A></LI> ! <LI><A class="wiki" HREF="VisitorPattern.html">VisitorPattern</A></LI> ! <LI><A class="wiki" HREF="CollectingParameter.html">CollectingParameter</A></LI> ! <LI><A class="wiki" HREF="StrategyPattern.html">StrategyPattern</A></LI> ! <LI><A class="wiki" HREF="CompositePattern.html">CompositePattern</A></LI></UL> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, May 16, 2003 2:30:12 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: EnableFeedback.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/EnableFeedback.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** EnableFeedback.html 26 Apr 2003 03:58:34 -0000 1.2 --- EnableFeedback.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Enable Feedback</title></head><body><DIV CLASS="wikitext"> <P><B>Enable Feedback</B></P> <P>If the parser needs to be switched to normal or debug mode, you can do this like so:</P> --- 1,3 ---- ! <html><head><title>Enable Feedback</title></head><body><DIV class="wikitext"> <P><B>Enable Feedback</B></P> <P>If the parser needs to be switched to normal or debug mode, you can do this like so:</P> *************** *** 18,21 **** ); </PRE> ! <P>You can also turn the feedback to QUIET mode (none of the events will be triggered), to get extra details. Check <A CLASS="wiki" HREF="BlockFeedback.html">BlockFeedback</A>. To handle the feedback yourself, without displaying it to standard output, subclass <SPAN CLASS="wikiunknown"><U>ParserFeedback</U></SPAN>, and override <I>info()</I>, <I>warning()</I> and <I>error()</I>.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:41:24 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 18,21 ---- ); </PRE> ! <P>You can also turn the feedback to QUIET mode (none of the events will be triggered), to get extra details. Check <A class="wiki" HREF="BlockFeedback.html">BlockFeedback</A>. To handle the feedback yourself, without displaying it to standard output, subclass <SPAN class="wikiunknown"><U>ParserFeedback</U></SPAN>, and override <I>info()</I>, <I>warning()</I> and <I>error()</I>.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:41:24 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: FrequentlyAskedQuestions.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FrequentlyAskedQuestions.html,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** FrequentlyAskedQuestions.html 13 Jul 2003 11:40:58 -0000 1.4 --- FrequentlyAskedQuestions.html 24 Aug 2003 18:44:10 -0000 1.5 *************** *** 1,3 **** ! <html><head><title>Frequently Asked Questions</title></head><body><DIV CLASS="wikitext"> <P><B>FAQ</B></P><HR/> <P><B>How does the parser deal with tags like <tag/> ?</B></P> --- 1,3 ---- ! <html><head><title>Frequently Asked Questions</title></head><body><DIV class="wikitext"> <P><B>FAQ</B></P><HR/> <P><B>How does the parser deal with tags like <tag/> ?</B></P> *************** *** 5,7 **** <P><B>How does the parser deal with HTML tags which should be terminated with /> but are not, i.e. <BR/> and <HR>? Is there any way to automatically know that some HTML tags are empty?</B></P><HR/> <P><B>How is JSP parsed using the HTMLParser?</B></P><HR/> ! <P><B>How do you find the byte offset from the beginning of a document for a tag?</B></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Thursday, June 19, 2003 10:49:11 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 5,7 ---- <P><B>How does the parser deal with HTML tags which should be terminated with /> but are not, i.e. <BR/> and <HR>? Is there any way to automatically know that some HTML tags are empty?</B></P><HR/> <P><B>How is JSP parsed using the HTMLParser?</B></P><HR/> ! <P><B>How do you find the byte offset from the beginning of a document for a tag?</B></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, June 19, 2003 10:49:11 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: StrategyPattern.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/StrategyPattern.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** StrategyPattern.html 26 Apr 2003 03:58:35 -0000 1.2 --- StrategyPattern.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,4 **** ! <html><head><title>Strategy Pattern</title></head><body><DIV CLASS="wikitext"> <P><B>Strategy Pattern</B></P> ! <P>The strategy is used in specifying the type of feedback object to be used in the parser. It defaults to a feedback object that sends feedback to standard output. Check <A CLASS="wiki" HREF="BlockFeedback.html">BlockFeedback</A> for more details.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 16, 2003 5:10:51 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,4 ---- ! <html><head><title>Strategy Pattern</title></head><body><DIV class="wikitext"> <P><B>Strategy Pattern</B></P> ! <P>The strategy is used in specifying the type of feedback object to be used in the parser. It defaults to a feedback object that sends feedback to standard output. Check <A class="wiki" HREF="BlockFeedback.html">BlockFeedback</A> for more details.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 16, 2003 5:10:51 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: TemplateMethod.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/TemplateMethod.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** TemplateMethod.html 26 Apr 2003 03:58:35 -0000 1.3 --- TemplateMethod.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,5 **** ! <html><head><title>Template Method</title></head><body><DIV CLASS="wikitext"> <P><B>Template Method</B></P> ! <P><I><A CLASS="wiki" HREF="TagScanner.html">TagScanner</A></I> uses a template method to create a scanned node - it calls a matching tag scanner to do its job and produce a scanned node in a series of steps.</P> <PRE> public final Tag createScannedNode( Tag tag, --- 1,5 ---- ! <html><head><title>Template Method</title></head><body><DIV class="wikitext"> <P><B>Template Method</B></P> ! <P><I><A class="wiki" HREF="TagScanner.html">TagScanner</A></I> uses a template method to create a scanned node - it calls a matching tag scanner to do its job and produce a scanned node in a series of steps.</P> <PRE> public final Tag createScannedNode( Tag tag, *************** *** 14,16 **** }</PRE> <P>This is useful as the tag has some necessary associations established here - the association with the scanner that parsed it, and the attributes of the original tag that was processed to produce the specialized tag.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:38:42 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 14,16 ---- }</PRE> <P>This is useful as the tag has some necessary associations established here - the association with the scanner that parsed it, and the attributes of the original tag that was processed to produce the specialized tag.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:38:42 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: SamplePrograms.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/SamplePrograms.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** SamplePrograms.html 26 Apr 2003 03:58:35 -0000 1.3 --- SamplePrograms.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,12 **** ! <html><head><title>Sample Programs</title></head><body><DIV CLASS="wikitext"> <P><B>Sample Progams</B></P> <UL> ! <LI><A CLASS="wiki" HREF="StringExtraction.html">StringExtraction</A></LI> ! <LI><A CLASS="wiki" HREF="LinkExtraction.html">LinkExtraction</A> (includes example of customized parsing with HTMLVisitor)</LI> ! <LI><A CLASS="wiki" HREF="EmailExtraction.html">EmailExtraction</A></LI> ! <LI><A CLASS="wiki" HREF="ImageExtraction.html">ImageExtraction</A></LI> ! <LI><A CLASS="wiki" HREF="WebCrawler.html">WebCrawler</A></LI> ! <LI><A CLASS="wiki" HREF="WebRipper.html">WebRipper</A></LI> ! <LI><A CLASS="named-wiki" HREF="ReverseHtml.html" TITLE="ReverseHtml">ReverseHtml rendering</A></LI> ! <LI><A CLASS="wiki" HREF="CustomTagExtraction.html">CustomTagExtraction</A></LI> ! <LI><A CLASS="wiki" HREF="JavaBeans.html">JavaBeans</A></LI></UL></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Thursday, April 24, 2003 4:45:21 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,12 ---- ! <html><head><title>Sample Programs</title></head><body><DIV class="wikitext"> <P><B>Sample Progams</B></P> <UL> ! <LI><A class="wiki" HREF="StringExtraction.html">StringExtraction</A></LI> ! <LI><A class="wiki" HREF="LinkExtraction.html">LinkExtraction</A> (includes example of customized parsing with HTMLVisitor)</LI> ! <LI><A class="wiki" HREF="EmailExtraction.html">EmailExtraction</A></LI> ! <LI><A class="wiki" HREF="ImageExtraction.html">ImageExtraction</A></LI> ! <LI><A class="wiki" HREF="WebCrawler.html">WebCrawler</A></LI> ! <LI><A class="wiki" HREF="WebRipper.html">WebRipper</A></LI> ! <LI><A class="named-wiki" HREF="ReverseHtml.html" title="ReverseHtml">ReverseHtml rendering</A></LI> ! <LI><A class="wiki" HREF="CustomTagExtraction.html">CustomTagExtraction</A></LI> ! <LI><A class="wiki" HREF="JavaBeans.html">JavaBeans</A></LI></UL></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, April 24, 2003 4:45:21 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: ParsingXml.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ParsingXml.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ParsingXml.html 27 Jul 2003 13:20:27 -0000 1.2 --- ParsingXml.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,11 **** ! <html><head><title>Parsing Xml</title></head><body><DIV CLASS="wikitext"> ! <P><?xml version="1.0" encoding="iso-8859-1" ?></P><BLOCKQUOTE STYLE="border-left-width: medium; border-left-color: #0f0; border-left-style: ridge; padding-left: 1em; margin-left: 0em; margin-right: 0em;"> <BLOCKQUOTE> ! <P><<SPAN CLASS="wikiunknown"><U>ReviewerInformation</U></SPAN>></P> <P><Reviewer></P> <P><PeopleID>9</PeopleID></P> ! <P><<A CLASS="wiki" HREF="FirstName.html">FirstName</A>>Niall</<A CLASS="wiki" HREF="FirstName.html">FirstName</A>></P> ! <P><<A CLASS="wiki" HREF="LastName.html">LastName</A>>Adams</<A CLASS="wiki" HREF="LastName.html">LastName</A>></P> ! <P><<SPAN CLASS="wikiunknown"><U>FullName</U></SPAN>>Niall Adams</<SPAN CLASS="wikiunknown"><U>FullName</U></SPAN>></P> <P><Organization>Imperial College</Organization></P> [...3375 lines suppressed...] *** 1806,1813 **** <P><City>Foster City</City></P> <P><State>CA</State></P> ! <P><<SPAN CLASS="wikiunknown"><U>PostalCode</U></SPAN>>94404</<SPAN CLASS="wikiunknown"><U>PostalCode</U></SPAN>></P> <P><Country>USA</Country></P> <P><Phone>650-349-0500 Ext 148</Phone></P> <P><Fax>509-479-4522</Fax></P> <P></Reviewer></P> ! <P></<SPAN CLASS="wikiunknown"><U>ReviewerInformation</U></SPAN>></P></BLOCKQUOTE></BLOCKQUOTE></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Tuesday, June 24, 2003 1:32:51 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1806,1813 ---- <P><City>Foster City</City></P> <P><State>CA</State></P> ! <P><<SPAN class="wikiunknown"><U>PostalCode</U></SPAN>>94404</<SPAN class="wikiunknown"><U>PostalCode</U></SPAN>></P> <P><Country>USA</Country></P> <P><Phone>650-349-0500 Ext 148</Phone></P> <P><Fax>509-479-4522</Fax></P> <P></Reviewer></P> ! <P></<SPAN class="wikiunknown"><U>ReviewerInformation</U></SPAN>></P></BLOCKQUOTE></BLOCKQUOTE></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Tuesday, June 24, 2003 1:32:51 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: LastName.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/LastName.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** LastName.html 27 Jul 2003 13:20:30 -0000 1.1 --- LastName.html 24 Aug 2003 18:44:10 -0000 1.2 *************** *** 1,2 **** ! <html><head><title>Last Name</title></head><body><DIV CLASS="wikitext"> ! <P>Describe <A CLASS="wiki" HREF="LastName.html">LastName</A> here.fdsadfsafdsaf</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Thursday, July 17, 2003 4:38:05 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,2 ---- ! <html><head><title>Last Name</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="LastName.html">LastName</A> here.fdsadfsafdsaf</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, July 17, 2003 4:38:05 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: ParserDesign.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ParserDesign.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** ParserDesign.html 26 Apr 2003 03:58:35 -0000 1.3 --- ParserDesign.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>Parser Design</title></head><body><DIV CLASS="wikitext"> <P><B>Parser Design</B></P> <P>HTMLParser is a SAX-like parser streaming parser, that has the capability to correct dirty-html on the fly. It is extremely fast and lightweight. The binary distribution of the jar file is around 135 KB only, and it can easily be brought down to 65 KB for a minimal parsing requirement (prior to optimization and obfuscation).</P> ! <P>It is also extensible. The parser provides both <A CLASS="wiki" HREF="InternalIterators.html">InternalIterators</A> and <A CLASS="wiki" HREF="ExternalIterators.html">ExternalIterators</A>. ! The parser has some interesting <A CLASS="wiki" HREF="PatternStories.html">PatternStories</A>..</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Monday, March 17, 2003 6:18:45 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,6 ---- ! <html><head><title>Parser Design</title></head><body><DIV class="wikitext"> <P><B>Parser Design</B></P> <P>HTMLParser is a SAX-like parser streaming parser, that has the capability to correct dirty-html on the fly. It is extremely fast and lightweight. The binary distribution of the jar file is around 135 KB only, and it can easily be brought down to 65 KB for a minimal parsing requirement (prior to optimization and obfuscation).</P> ! <P>It is also extensible. The parser provides both <A class="wiki" HREF="InternalIterators.html">InternalIterators</A> and <A class="wiki" HREF="ExternalIterators.html">ExternalIterators</A>. ! The parser has some interesting <A class="wiki" HREF="PatternStories.html">PatternStories</A>..</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Monday, March 17, 2003 6:18:45 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: FirstName.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FirstName.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** FirstName.html 27 Jul 2003 13:20:30 -0000 1.1 --- FirstName.html 24 Aug 2003 18:44:10 -0000 1.2 *************** *** 1,2 **** ! <html><head><title>First Name</title></head><body><DIV CLASS="wikitext"> ! <P>Describe <A CLASS="wiki" HREF="FirstName.html">FirstName</A> here.</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Thursday, July 17, 2003 4:35:59 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,2 ---- ! <html><head><title>First Name</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="FirstName.html">FirstName</A> here.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Thursday, July 17, 2003 4:35:59 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: ImageExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/ImageExtraction.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** ImageExtraction.html 13 Jul 2003 11:40:58 -0000 1.3 --- ImageExtraction.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,6 **** ! <html><head><title>Image Extraction</title></head><body><DIV CLASS="wikitext"> <P><B>Image Extractions</B></P> ! <P>This is very similar to <A CLASS="wiki" HREF="LinkExtraction.html">LinkExtraction</A>.</P> ! <P>1. Use the <I><SPAN CLASS="wikiunknown"><U>ObjectFindingVisitor</U></SPAN></I> like so :</P> <PRE>Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children --- 1,6 ---- ! <html><head><title>Image Extraction</title></head><body><DIV class="wikitext"> <P><B>Image Extractions</B></P> ! <P>This is very similar to <A class="wiki" HREF="LinkExtraction.html">LinkExtraction</A>.</P> ! <P>1. Use the <I><SPAN class="wikiunknown"><U>ObjectFindingVisitor</U></SPAN></I> like so :</P> <PRE>Parser parser = new Parser("http://urlIWantToParse.com"); // Create a visitor, specify that you want to recurse through its children *************** *** 31,33 **** System.out.println(imageTag.getImageLocation()); }</PRE> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A>, Sunday, February 16, 2003 2:02:18 pm.</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Wednesday, June 25, 2003 9:11:46 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 31,33 ---- System.out.println(imageTag.getImageLocation()); }</PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A>, Sunday, February 16, 2003 2:02:18 pm.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Wednesday, June 25, 2003 9:11:46 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: FeedbackMechanism.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FeedbackMechanism.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** FeedbackMechanism.html 26 Apr 2003 03:58:34 -0000 1.2 --- FeedbackMechanism.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Feedback Mechanism</title></head><body><DIV CLASS="wikitext"> <P><B>Feedback Mechanism</B></P> <P>The parser has a feedback mechanism that allows you to obtain feedback about the parsing process. You can get to know if there were any errors, or any warnings, or any general information. Warnings occur when the parser has encountered dirty html, but was able to fix it and continue. Errors occur when the parser was not able to handle the html.</P> --- 1,3 ---- ! <html><head><title>Feedback Mechanism</title></head><body><DIV class="wikitext"> <P><B>Feedback Mechanism</B></P> <P>The parser has a feedback mechanism that allows you to obtain feedback about the parsing process. You can get to know if there were any errors, or any warnings, or any general information. Warnings occur when the parser has encountered dirty html, but was able to fix it and continue. Errors occur when the parser was not able to handle the html.</P> *************** *** 28,30 **** } }</PRE> ! <P>You can supply an object of this type to the parser in the constructor, and accordingly channel the feedback.</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Friday, March 21, 2003 11:51:12 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 28,30 ---- } }</PRE> ! <P>You can supply an object of this type to the parser in the constructor, and accordingly channel the feedback.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, March 21, 2003 11:51:12 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: EmailExtraction.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/EmailExtraction.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** EmailExtraction.html 26 Apr 2003 03:58:34 -0000 1.2 --- EmailExtraction.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Email Extraction</title></head><body><DIV CLASS="wikitext"> <P><B>Email Extraction</B></P> <P>This is very similar to link extraction. You have to extract links from a page and verify that they are email addresses. Link tags have a method - <I>isMailLink()</I></P> --- 1,3 ---- ! <html><head><title>Email Extraction</title></head><body><DIV class="wikitext"> <P><B>Email Extraction</B></P> <P>This is very similar to link extraction. You have to extract links from a page and verify that they are email addresses. Link tags have a method - <I>isMailLink()</I></P> *************** *** 12,14 **** } }</PRE> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A>, February 16, 2003 11:41 am</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:24:25 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 12,14 ---- } }</PRE> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A>, February 16, 2003 11:41 am</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:24:25 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: UnitTestingPdf.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/UnitTestingPdf.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** UnitTestingPdf.html 26 Apr 2003 03:58:35 -0000 1.2 --- UnitTestingPdf.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,2 **** ! <html><head><title>Unit Testing Pdf</title></head><body><DIV CLASS="wikitext"> ! <P>Describe <A CLASS="wiki" HREF="UnitTestingPdf.html">UnitTestingPdf</A> here. How are you?</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Friday, April 11, 2003 11:47:02 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,2 ---- ! <html><head><title>Unit Testing Pdf</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="UnitTestingPdf.html">UnitTestingPdf</A> here. How are you?</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Friday, April 11, 2003 11:47:02 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: FactoryMethod.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/FactoryMethod.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** FactoryMethod.html 26 Apr 2003 03:58:34 -0000 1.3 --- FactoryMethod.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,9 **** ! <html><head><title>Factory Method</title></head><body><DIV CLASS="wikitext"> <P><B>Factory Method</B></P> ! <P><I><A CLASS="wiki" HREF="TagScanner.html">TagScanner</A></I> possess an FM for the creation of a tag.</P> <PRE> protected Tag createTag(TagData tagData);</PRE> <P>Scanner subclasses override this to specify the type of tag to be constructed.</P> ! <P><I><SPAN CLASS="wikiunknown"><U>CompositeTagScanner</U></SPAN></I> possesses an FM for the creation of a tag.</P> <PRE> protected Tag createTag(TagData tagData,CompositeTagData compositeTagData);</PRE> <P>Composite scanners override this to specify the type of tag to be constructed.</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:37:36 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,9 ---- ! <html><head><title>Factory Method</title></head><body><DIV class="wikitext"> <P><B>Factory Method</B></P> ! <P><I><A class="wiki" HREF="TagScanner.html">TagScanner</A></I> possess an FM for the creation of a tag.</P> <PRE> protected Tag createTag(TagData tagData);</PRE> <P>Scanner subclasses override this to specify the type of tag to be constructed.</P> ! <P><I><SPAN class="wikiunknown"><U>CompositeTagScanner</U></SPAN></I> possesses an FM for the creation of a tag.</P> <PRE> protected Tag createTag(TagData tagData,CompositeTagData compositeTagData);</PRE> <P>Composite scanners override this to specify the type of tag to be constructed.</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:37:36 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: UsingCookiesWithParser.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/UsingCookiesWithParser.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** UsingCookiesWithParser.html 26 Apr 2003 03:58:35 -0000 1.3 --- UsingCookiesWithParser.html 24 Aug 2003 18:44:10 -0000 1.4 *************** *** 1,10 **** ! <html><head><title>Using Cookies With Parser</title></head><body><DIV CLASS="wikitext"> <P><B>Using Cookies with the Parser</B></P> ! <P><B>Problem:</B> (by <SPAN CLASS="wikiunknown"><U>ShanSivakolundhu</U></SPAN>)<BR/>In order to access a particular site I neet to have a cookie set. Is there any way I can set the cookie before I create a parser object ? Just like ...</P> <PRE>URLConnection.("Cookie", cookieValue); URLConnection.connect();</PRE> ! <P><B>Solution:</B> (by <SPAN CLASS="wikiunknown"><U>BobLewis</U></SPAN>)<BR/>In order to send cookies in your Http requests, all you need to do is set the Cookie HTTP Header in the URL Connection.</P> --- 1,10 ---- ! <html><head><title>Using Cookies With Parser</title></head><body><DIV class="wikitext"> <P><B>Using Cookies with the Parser</B></P> ! <P><B>Problem:</B> (by <SPAN class="wikiunknown"><U>ShanSivakolundhu</U></SPAN>)<BR/>In order to access a particular site I neet to have a cookie set. Is there any way I can set the cookie before I create a parser object ? Just like ...</P> <PRE>URLConnection.("Cookie", cookieValue); URLConnection.connect();</PRE> ! <P><B>Solution:</B> (by <SPAN class="wikiunknown"><U>BobLewis</U></SPAN>)<BR/>In order to send cookies in your Http requests, all you need to do is set the Cookie HTTP Header in the URL Connection.</P> *************** *** 26,30 **** reader = new HTMLReader(isr, 8192); parser = new HTMLParser(reader, feedback);</PRE> ! <P>The <SPAN CLASS="wikiunknown"><U>HttpUtil</U></SPAN>.getCharacterSet method used above is basically just taken from the method of the same name in the HTMLParser class. That method is protected, so --- 26,30 ---- reader = new HTMLReader(isr, 8192); parser = new HTMLParser(reader, feedback);</PRE> ! <P>The <SPAN class="wikiunknown"><U>HttpUtil</U></SPAN>.getCharacterSet method used above is basically just taken from the method of the same name in the HTMLParser class. That method is protected, so *************** *** 68,70 **** } return buf.toString(); ! }</PRE></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Wednesday, April 2, 2003 3:04:24 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 68,70 ---- } return buf.toString(); ! }</PRE></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Wednesday, April 2, 2003 3:04:24 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: TagFindingVisitor.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/TagFindingVisitor.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** TagFindingVisitor.html 25 May 2003 19:30:14 -0000 1.1 --- TagFindingVisitor.html 24 Aug 2003 18:44:10 -0000 1.2 *************** *** 1,2 **** ! <html><head><title>Tag Finding Visitor</title></head><body><DIV CLASS="wikitext"> ! <P>Describe <A CLASS="wiki" HREF="TagFindingVisitor.html">TagFindingVisitor</A> here.</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Tuesday, May 6, 2003 1:43:25 am.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,2 ---- ! <html><head><title>Tag Finding Visitor</title></head><body><DIV class="wikitext"> ! <P>Describe <A class="wiki" HREF="TagFindingVisitor.html">TagFindingVisitor</A> here.</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Tuesday, May 6, 2003 1:43:25 am.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: SomikRaha.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/SomikRaha.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** SomikRaha.html 26 Apr 2003 03:58:35 -0000 1.2 --- SomikRaha.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,5 **** ! <html><head><title>Somik Raha</title></head><body><DIV CLASS="wikitext"> <BLOCKQUOTE> <P>Welcome to my space on the HTMLParser wiki. I founded the HTMLParser project two years back, under the confusion that there were no other html parsers around. Since then, it has been one exciting journey - made possible because of the vast feedback that has been pouring in, that really shows the power of the open-source movement.</P> ! <P>You can visit my <A CLASS="namedurl" HREF="http://www.geocities.com/somik/index.html"><SPAN STYLE="white-space: nowrap">home</SPAN> page</A> to see some of my other work.</P></BLOCKQUOTE> ! <P>-</P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 9:16:16 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 1,5 ---- ! <html><head><title>Somik Raha</title></head><body><DIV class="wikitext"> <BLOCKQUOTE> <P>Welcome to my space on the HTMLParser wiki. I founded the HTMLParser project two years back, under the confusion that there were no other html parsers around. Since then, it has been one exciting journey - made possible because of the vast feedback that has been pouring in, that really shows the power of the open-source movement.</P> ! <P>You can visit my <A class="namedurl" href="http://www.geocities.com/somik/index.html"><SPAN style="white-space: nowrap">home</SPAN> page</A> to see some of my other work.</P></BLOCKQUOTE> ! <P>-</P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 9:16:16 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: CollectingParameter.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/CollectingParameter.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** CollectingParameter.html 26 Apr 2003 03:58:34 -0000 1.2 --- CollectingParameter.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Collecting Parameter</title></head><body><DIV CLASS="wikitext"> <P><B>Collecting Parameter</B></P> <P>The parser allows the use of a collecting parameter in two modes</P> --- 1,3 ---- ! <html><head><title>Collecting Parameter</title></head><body><DIV class="wikitext"> <P><B>Collecting Parameter</B></P> <P>The parser allows the use of a collecting parameter in two modes</P> *************** *** 6,8 **** <LI>Node.collectInto() during external iteration</LI></UL> <P>Either way, nodes are collected into the collecting parameter object if they satisfy a match criterion (usually the type).</P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:40:12 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 6,8 ---- <LI>Node.collectInto() during external iteration</LI></UL> <P>Either way, nodes are collected into the collecting parameter object if they satisfy a match criterion (usually the type).</P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:40:12 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: SearchingForData.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/SearchingForData.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** SearchingForData.html 26 Apr 2003 03:59:31 -0000 1.1 --- SearchingForData.html 24 Aug 2003 18:44:10 -0000 1.2 *************** *** 1,3 **** ! <html><head><title>Searching For Data</title></head><body><DIV CLASS="wikitext"> <P>Searching for data is one of the most challenging tasks in a web page due to its seemingly unstructured (or badly structured) form. Complex searches are now possible with the parser in a simple to use API. Here's an example :</P> <P>We are looking at a page which has the following html:</P> --- 1,3 ---- ! <html><head><title>Searching For Data</title></head><body><DIV class="wikitext"> <P>Searching for data is one of the most challenging tasks in a web page due to its seemingly unstructured (or badly structured) form. Complex searches are now possible with the parser in a simple to use API. Here's an example :</P> <P>We are looking at a page which has the following html:</P> *************** *** 76,78 **** // The name is the second item in the column tag ! Node expectedName = nextColumn.childAt(1);</PRE></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Saturday, April 19, 2003 10:38:30 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 76,78 ---- // The name is the second item in the column tag ! Node expectedName = nextColumn.childAt(1);</PRE></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Saturday, April 19, 2003 10:38:30 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: WebRipper.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/WebRipper.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** WebRipper.html 26 Apr 2003 03:58:35 -0000 1.2 --- WebRipper.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Web Ripper</title></head><body><DIV CLASS="wikitext"> <P><B>Web Ripper</B></P> <P>A ripper is a program that downloads html content to your hard disk. It involves <B>modifying links and image locations</B> to point to locations in your hard disk.</P> --- 1,3 ---- ! <html><head><title>Web Ripper</title></head><body><DIV class="wikitext"> <P><B>Web Ripper</B></P> <P>A ripper is a program that downloads html content to your hard disk. It involves <B>modifying links and image locations</B> to point to locations in your hard disk.</P> *************** *** 8,11 **** writeToFile(visitor.getModifiedResult()); // you have to define writeToFile in your app program</PRE> <P>This visitor simply modifies the links it finds in the page with the prefix you have provided. It then passes back the representation of the page via <I>getModifiedResult()</I>.</P> ! <P>If you're dealing with frames, you might want to enhance this visitor to be able to modify links on the frame tags. In such a case, override visitTag(), and check if the tag is a <SPAN CLASS="wikiunknown"><U>FrameTag</U></SPAN> (Note, <SPAN CLASS="wikiunknown"><U>UrlModifyingVisitor</U></SPAN> will register link and image scanners only, so you will need to register the frame scanner seperately). Then, you can proceed to modify the src attribute, use <I>Tag.setAttribute()</I></P> ! <P>--<A CLASS="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV ID="actionbar" CLASS="toolbar"><HR NOSHADE="noshade" CLASS="printer"/><P CLASS="editdate">Last edited on Sunday, February 23, 2003 5:32:39 pm.</P><HR NOSHADE="noshade" CLASS="toolbar"/></body></html> \ No newline at end of file --- 8,11 ---- writeToFile(visitor.getModifiedResult()); // you have to define writeToFile in your app program</PRE> <P>This visitor simply modifies the links it finds in the page with the prefix you have provided. It then passes back the representation of the page via <I>getModifiedResult()</I>.</P> ! <P>If you're dealing with frames, you might want to enhance this visitor to be able to modify links on the frame tags. In such a case, override visitTag(), and check if the tag is a <SPAN class="wikiunknown"><U>FrameTag</U></SPAN> (Note, <SPAN class="wikiunknown"><U>UrlModifyingVisitor</U></SPAN> will register link and image scanners only, so you will need to register the frame scanner seperately). Then, you can proceed to modify the src attribute, use <I>Tag.setAttribute()</I></P> ! <P>--<A class="wiki" HREF="SomikRaha.html">SomikRaha</A></P></DIV><DIV id="actionbar" class="toolbar"><HR noshade="noshade" class="printer"/><P class="editdate">Last edited on Sunday, February 23, 2003 5:32:39 pm.</P><HR noshade="noshade" class="toolbar"/></body></html> \ No newline at end of file Index: TestDrivenDevelopment.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/docs/TestDrivenDevelopment.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** TestDrivenDevelopment.html 26 Apr 2003 03:58:35 -0000 1.2 --- TestDrivenDevelopment.html 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 1,3 **** ! <html><head><title>Test Driven Development</title></head><body><DIV CLASS="wikitext"> <P><BIG><B>Test Driven Development</B></BIG></P> <P>Test-Driven development is a rewarding practice, that stands out in eXtreme Programming. All the other practices of XP can usually be compromised when dealing with distributed teams, but the one practice that can and must be followed always - is test-driven development (TDD).</P> --- 1,3 ---- ! <html><head><title>Test Driven Development</title></head><body><DIV class="wikitext"> <P><BIG><B>Test Driven Development</B></BIG></P> <P>Test-Driven development is a rewarding practice, that stands out in eXtreme Programming. All the other practices of XP can usually be compromised when dealing with distributed teams, but the one practice that can and must be followed always - is test-driven development (TDD).</P> *************** *** 12,24 **** <P>Its time to take a look at the test-framework in place that enables quick and easy testing - as we believe that testing should be painless. If it is too hard, we wouldnt do it. To begin with, as is the case with most Java XP projects of the day, we use JUnit. JUnit allows us to create suites of tests and run them automatically. In case you are new to unit testing, please make sure that you have read and tried out the examples from Test Infected - Programmers Love Writing Tests.</P> <P>Once you are comfortable with JUnit, you are ready to start writing tests for the parser. ! We provide you with a utility class - <SPAN CLASS="wikiunknown"><U>ParserTestCase</U></SPAN> - with which you can rig up a complex test pretty easily. This class is in the org.htmlparser... [truncated message content] |
From: <der...@us...> - 2003-08-24 18:44:13
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv15993/htmlparser Modified Files: build.xml Log Message: update Wiki image Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** build.xml 23 Aug 2003 01:33:06 -0000 1.42 --- build.xml 24 Aug 2003 18:44:10 -0000 1.43 *************** *** 2,6 **** Build Procedure - cd htmlparser ! - 'ant jars' generates new htmlparser.jar and htmllexer.jar in htmlparser/release/htmlparser1_4/lib Release Procedure --- 2,6 ---- Build Procedure - cd htmlparser ! - 'ant jar' generates new htmlparser.jar and htmllexer.jar in htmlparser/release/htmlparser1_4/lib Release Procedure *************** *** 13,21 **** that's why this step can't be automated - incorporate changes from ChangeLog into htmlparser/docs/changes under ! a heading like "Integration Build 1.4 - 20030713" ! - 'ant jar' generates new htmlparser.jar in htmlparser/release/htmlparser1_4/lib - 'ant test' compiles and runs the unit tests ! - 'rm /home/derrick/htmlparser/htmlparser_cvs/htmlparser/docs/docs/*' ! and 'rm /home/derrick/htmlparser/htmlparser_cvs/htmlparser/docs/docs/images/*' deletes local Wiki pages, of course any one else would have to adjust this and also the hard-coded path in WikiCapturer --- 13,21 ---- that's why this step can't be automated - incorporate changes from ChangeLog into htmlparser/docs/changes under ! a heading like "Integration Build 1.4 - 20030824" ! - 'ant jar' generates new htmlparser.jar and htmllexer.jar in htmlparser/release/htmlparser1_4/lib - 'ant test' compiles and runs the unit tests ! - 'rm /home/derrick/htmlparser_cvs/htmlparser/docs/docs/*' ! and 'rm /home/derrick/htmlparser_cvs/htmlparser/docs/docs/images/*' deletes local Wiki pages, of course any one else would have to adjust this and also the hard-coded path in WikiCapturer *************** *** 25,33 **** - perform a CVS update on htmlparser/docs/docs to identify new and changed files and commit them - that's why this step can't be automated ! - 'ant' updates the version headers, creates the jar file and doc files and zips ! everything into a file htmlparser/distribution/htmlparser1_4_20030727.zip - commit docs/changes, docs/docs, docs/docs/images and src/* using a reason of the form: ! Update version headers to 1.4-20030727 and update changelog. Sourceforge File Release Procedure --- 25,33 ---- - perform a CVS update on htmlparser/docs/docs to identify new and changed files and commit them - that's why this step can't be automated ! - 'ant clean htmlparser' updates the version headers, creates the jar file and doc files and zips ! everything into a file htmlparser/distribution/htmlparser1_4_20030824.zip - commit docs/changes, docs/docs, docs/docs/images and src/* using a reason of the form: ! Update version headers to 1.4-20030824 and update changelog. Sourceforge File Release Procedure *************** *** 38,45 **** ftp> cd incoming ftp> bin ! ftp> put htmlparser1_4_20030727.zip ftp> bye - add a release to the 'Integation Builds' package ! Admin-File Releases-Add Release, use a name of the form '1_4_20030727' - Step 1, 'Paste The Notes' (using numeric character references and character entity references because this is displayed as HTML) with a --- 38,45 ---- ftp> cd incoming ftp> bin ! ftp> put htmlparser1_4_20030824.zip ftp> bye - add a release to the 'Integation Builds' package ! Admin-File Releases-Add Release, use a name of the form '1_4_2003824' - Step 1, 'Paste The Notes' (using numeric character references and character entity references because this is displayed as HTML) with a *************** *** 50,54 **** Pending Bugs: - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited ! - Step 2, check the checkbox of the htmlparser1_4_20030727.zip file from the list of files in the uploads section - Submit/Refresh --- 50,54 ---- Pending Bugs: - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited ! - Step 2, check the checkbox of the htmlparser1_4_20030824.zip file from the list of files in the uploads section - Submit/Refresh *************** *** 63,67 **** Submit News - from the project summary screen, select 'Submit News' and title it like: ! HTML Parser Integration Release 1.4-20030727 - type in a summary of the changes made - SUBMIT --- 63,67 ---- Submit News - from the project summary screen, select 'Submit News' and title it like: ! HTML Parser Integration Release 1.4-20030824 - type in a summary of the changes made - SUBMIT *************** *** 197,205 **** <!-- Compile the java code in ${src} --> ! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**,org/htmlparser/util/Generate.java" debug="on" classpath="src:${commons-logging.jar}" /> </target> ! <!-- Create the distribution of htmlparser.jar and htmllexer.jar --> ! <target name="jars" depends="compile" description="create htmlparser.jar and htmllexer.jar"> <!-- Create the distribution directory --> <mkdir dir="${dist}/lib"/> --- 197,266 ---- <!-- Compile the java code in ${src} --> ! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**,org/htmlparser/util/Generate.java" debug="on" classpath="src:${commons-logging.jar}"/> </target> ! <target name="compilelexer" description="compile lexer java files"> ! <echo message="**********************************"/> ! <echo message="* Compiling lexer.... *"/> ! <echo message="**********************************"/> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}" target="1.1"> ! <include name="org/htmlparser/lexer/**/*.java"/> ! <include name="org/htmlparser/AbstractNode.java"/> ! <include name="org/htmlparser/Node.java"/> ! <include name="org/htmlparser/util/ParserException.java"/> ! <include name="org/htmlparser/util/ChainedException.java"/> ! <include name="org/htmlparser/util/NodeList.java"/> ! <include name="org/htmlparser/util/NodeIterator.java"/> ! <include name="org/htmlparser/util/SimpleNodeIterator.java"/> ! <include name="org/htmlparser/util/sort/**/*.java"/> ! <include name="org/htmlparser/parserHelper/SpecialHashtable.class"/> ! </javac> ! </target> ! ! <target name="compileparser" depends="compilelexer" description="compile parser java files"> ! <echo message="**********************************"/> ! <echo message="* Compiling parser.... *"/> ! <echo message="**********************************"/> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}"> ! <include name="org/htmlparser/**/*.java"/> ! <exclude name="org/htmlparser/tests/**"/> ! <exclude name="org/htmlparser/util/Generate.java"/> ! </javac> ! </target> ! ! <!-- Create the distribution of htmlparser.jar and htmllexer.jar --> ! <target name="jar" depends="jarlexer,jarparser" description="create htmlparser.jar and htmllexer.jar"/> ! ! <!-- Create the distribution of htmllexer.jar --> ! <target name="jarlexer" depends="compilelexer" description="create htmllexer.jar"> ! <!-- Create the distribution directory --> ! <mkdir dir="${dist}/lib"/> ! ! <echo message="**********************************"/> ! <echo message="* Creating htmllexer.jar.... *"/> ! <echo message="**********************************"/> ! ! <!-- Put classes and images into the htmllexer.jar file --> ! <jar jarfile="${dist}/lib/htmllexer.jar" ! basedir="${src}"> ! <include name="org/htmlparser/lexer/**/*.class"/> ! <include name="org/htmlparser/AbstractNode.class"/> ! <include name="org/htmlparser/Node.class"/> ! <include name="org/htmlparser/util/ParserException.class"/> ! <include name="org/htmlparser/util/ChainedException.class"/> ! <include name="org/htmlparser/util/NodeList*.class"/> ! <include name="org/htmlparser/util/NodeIterator.class"/> ! <include name="org/htmlparser/util/SimpleNodeIterator.class"/> ! <include name="org/htmlparser/util/sort/**/*.class"/> ! <include name="org/htmlparser/parserHelper/SpecialHashtable.class"/> ! <manifest> ! <attribute name="Main-Class" value="org.htmlparser.lexer.Lexer"/> ! </manifest> ! </jar> ! ! </target> ! ! <!-- Create the distribution of htmlparser.jar --> ! <target name="jarparser" depends="compileparser" description="create htmlparser.jar"> <!-- Create the distribution directory --> <mkdir dir="${dist}/lib"/> *************** *** 233,261 **** </manifest> </jar> - - <echo message="**********************************"/> - <echo message="* Creating htmllexer.jar.... *"/> - <echo message="**********************************"/> - - <!-- Put classes and images into the htmllexer.jar file --> - <jar jarfile="${dist}/lib/htmllexer.jar" - basedir="${src}"> - <include name="org/htmlparser/lexer/**/*.class"/> - <include name="org/htmlparser/AbstractNode.class"/> - <include name="org/htmlparser/Node.class"/> - <include name="org/htmlparser/util/ParserException.class"/> - <include name="org/htmlparser/util/ChainedException.class"/> - <include name="org/htmlparser/util/sort/**/*.class"/> - <!-- to be removed --> - <include name="org/htmlparser/parserHelper/SpecialHashtable.class"/> - <manifest> - <attribute name="Main-Class" value="org.htmlparser.lexer.Lexer"/> - </manifest> - </jar> - </target> <!-- Run the unit tests --> ! <target name="test" depends="jars" description="run the JUnit tests"> <echo message="**********************************"/> <echo message="* Running unit tests.... *"/> --- 294,301 ---- </manifest> </jar> </target> <!-- Run the unit tests --> ! <target name="test" depends="jar" description="run the JUnit tests"> <echo message="**********************************"/> <echo message="* Running unit tests.... *"/> *************** *** 352,356 **** <!-- The release directory structuring finishes here --> ! <target name="Release" depends="versionSource,jars,javadoc,CopyBatch" description="prepare the release files"> </target> --- 392,396 ---- <!-- The release directory structuring finishes here --> ! <target name="Release" depends="versionSource,jar,javadoc,CopyBatch" description="prepare the release files"> </target> |
From: <der...@us...> - 2003-08-24 18:44:13
|
Update of /cvsroot/htmlparser/WikiCapturer/src/org/htmlparser/wikicapturer In directory sc8-pr-cvs1:/tmp/cvs-serv15993/WikiCapturer/src/org/htmlparser/wikicapturer Modified Files: CaptureWiki.java Log Message: update Wiki image Index: CaptureWiki.java =================================================================== RCS file: /cvsroot/htmlparser/WikiCapturer/src/org/htmlparser/wikicapturer/CaptureWiki.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** CaptureWiki.java 26 Apr 2003 03:47:47 -0000 1.2 --- CaptureWiki.java 24 Aug 2003 18:44:10 -0000 1.3 *************** *** 30,34 **** File file = new File ("./"); System.out.println (file.getAbsolutePath ()); ! captureWiki.captureTo("/home/derrick/htmlparser/htmlparser_cvs/htmlparser/docs/docs/"); } --- 30,34 ---- File file = new File ("./"); System.out.println (file.getAbsolutePath ()); ! captureWiki.captureTo("/home/derrick/htmlparser_cvs/htmlparser/docs/docs/"); } |
From: <so...@us...> - 2003-08-24 18:43:50
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1:/tmp/cvs-serv15955/src/org/htmlparser/tests/scannersTests Modified Files: ScriptScannerTest.java Log Message: removed unused assertion Index: ScriptScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** ScriptScannerTest.java 15 Aug 2003 20:51:48 -0000 1.34 --- ScriptScannerTest.java 24 Aug 2003 18:43:47 -0000 1.35 *************** *** 147,153 **** assertTrue("Node should be a script tag",node[0] instanceof ScriptTag); - // Check the data in the applet tag - ScriptTag scriptTag = (ScriptTag)node[0]; - //assertStringEquals("Expected Script Code",testHTML2,scriptTag.getScriptCode()); } --- 147,150 ---- |